Exploitation Platform Open Architecture - Thematic Exploitation Platform

esrin

Exploitation Platform Open Architecture

Prepared by Issue Revision Date of Issue

Salvatore Pinto DRAFT 2 22/04/2016

Title: Exploitation Platform Open Architecture Issue: DRAFT

Revision: 3

Author: Salvatore Pinto, RHEA

Date: 22/04/2016

Contributors: Jordi Farres, ESA Alessandro Marin, Solenix Sveinung Loekken, ESA Adrian Rose, CGI Paulo Sacramento, Solenix Cristiano Lopes, ESA

European Space Agency, 2016 © 2016 by the European Space Agency. This document is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ .

Page 2/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

Reason for change

Issue

Revision

Date

Initial draft release

DRAFT

1

23/10/2014

Internal review, first public release

DRAFT

2

23/03/2015

Minor changes and typo corrections

DRAFT

3

22/04/2016

Issue DRAFT

Revision 3

Reason for change

Paragraph(s)

Addition of interactive development tools

2.4.2

Updated references to interface document standards and profiles

1.7

Workspace now supports user data stored within a dedicated user data basket

2.5.3

Geo Resource Browser now includes web GIS post-analysis capabilities

2.6.5

Collaboration approach extended to cover development and integration of software

2.4.1, 2.6.6

Minor changes and typo corrections

-


Table of contents:

1

INTRODUCTION ....................................................................................... 6

1.1 Background .......................................................................................................................6 1.2 Document Purpose and Scope ..........................................................................................6 1.3 License ............................................................................................................................... 7 1.4 Structure of the Document ............................................................................................... 7 1.5 Acronyms and Abbreviations ........................................................................................... 8 1.6 Applicable documents .......................................................................................................9 1.7 Reference documents ........................................................................................................9 2

ARCHITECTURE...................................................................................... 11

2.1 Terminology .................................................................................................................... 11 2.1.1 Requirements verbal form ........................................................................................................................... 13 2.1.2 Conventions ..................................................................................................................................................14

2.2 General Overview ............................................................................................................ 15 2.2.1 User Access Portal ........................................................................................................................................16 2.2.2 Resource Management .................................................................................................................................16 2.2.3 Service Integration ....................................................................................................................................... 17 2.2.4 Execution Environment ............................................................................................................................... 17

2.3 Resource Management ................................................................................................... 18 2.3.1 Catalogue .......................................................................................................................................................19 2.3.2 Resource Ingestion ...................................................................................................................................... 22 2.3.3 Internal Repository ...................................................................................................................................... 24 2.3.4 External Repository ..................................................................................................................................... 25 2.3.5 Resource Access Gateway ............................................................................................................................ 25

2.4 Service Integration .......................................................................................................... 27 2.4.1 DevBox ......................................................................................................................................................... 29 2.4.2 Interactive Development ............................................................................................................................. 30 2.4.3 Packaging Tool .............................................................................................................................................. 31 2.4.4 Import Tool ................................................................................................................................................... 31 2.4.5 Test/Debug Tool .......................................................................................................................................... 32

2.5 Execution Environment ..................................................................................................33 2.5.1 Execution Cluster ......................................................................................................................................... 35 2.5.2 Capacity Manager ........................................................................................................................................ 36 Page 4/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

2.5.3 Workflow Manager ...................................................................................................................................... 36 2.5.4 App Manager ................................................................................................................................................ 39 2.5.5 Execution Gateway .......................................................................................................................................41

2.6 User Access Portal .......................................................................................................... 43 2.6.1 Marketplace ................................................................................................................................................. 44 2.6.2 Workspace .................................................................................................................................................... 45 2.6.3 Systematic Processing ................................................................................................................................. 47 2.6.4 Collaboration Bucket ................................................................................................................................... 48 2.6.5 Geo Resource Browser ................................................................................................................................. 49 2.6.6 Support Tools ................................................................................................................................................ 51 2.6.7 Management Console ................................................................................................................................... 51 2.6.8 Account and Monitor ................................................................................................................................... 52 2.6.9 AAI ................................................................................................................................................................ 53

3

END-TO-END SCENARIOS ....................................................................... 56

3.1 EO Data Exploitation ......................................................................................................56 3.1.1 Processing Results ....................................................................................................................................... 58 3.1.2 EO Data ........................................................................................................................................................ 58 3.1.3 Workflow ...................................................................................................................................................... 59 3.1.4 App ................................................................................................................................................................61 3.1.5 Article ........................................................................................................................................................... 63

3.2 New EO Service Development ....................................................................................... 63 3.2.1 Processing Service updates ......................................................................................................................... 65 3.2.2 Security considerations ............................................................................................................................... 65

3.3 New EO Product Development ....................................................................................... 67


1

INTRODUCTION

1.1

Background

The present ESA operations concept for the PDGS and consequent support to EO data exploitation has evolved in support of the exploitation scenarios of ENVISAT, ERS, and Third Party Mission (TPM). Opportunities stemming from the evolution of technology and corresponding shifts in user expectations, and challenges stemming from the increasing volumes and complexity of data drive the emergence of a new and complementary operations concept, based on the availability of collaborative Exploitation Platforms, and with the role of ESA as coordinator of multiple, synergistic European efforts. An Exploitation Platform is a virtual workspace, providing the user community with access to (i) large volume of data (EO/non-space data), (ii) algorithm development and integration environment, (iii) processing software and services (e.g. toolboxes, retrieval baselines, visualization routines), (iv) computing resources (e.g. hybrid cloud/grid), (v) collaboration tools (e.g. forums, wiki, knowledge base, open publications, social networking…), (vi) general operation capabilities (e.g. user management and access control, accounting, etc…). The platforms thus provides a complete work environment for its’ users, enabling them to easily and effectively perform data-intensive research. The platform permits the execution of dedicated processing software close to the data, thereby avoiding moving large volumes of data through the network and spending time on developing tools for sourcing data, basic data manipulation, etc. Moreover, the platforms offers a collaboration environment, where the scientist can share their algorithm with the community, publish results and perform development. Exploitation Platforms are usually tailored to a particular scope. For example, a Thematic Exploitation Platform (TEP) is related of a user community and research field (e.g. Hydrology TEP), a Regional Exploitation Platform to a given area of interest (e.g. Europe, Sierra Nevada, Japan), a Mission Exploitation Platform to a particular satellite mission (e.g. PROBA-V, SMOS, GOCE, …). To harmonize the different Exploitation Platforms and maximize the re-use of technology and components, thus reducing the cost required to develop, maintain and operate and Exploitation Platform, ESA defined the concept of a General Exploitation Platform Architecture, which includes a set of Common Components, to be reused by the single platforms and tailored to the particular platform needs.

1.2

Document Purpose and Scope

The purpose of this document is to define a common architecture for the ESA Exploitation Platforms. This architecture is considered open to any bodies willing to implement Exploitation Platforms or to contribute to the Exploitation Platforms definition and evolution.


The architecture is based on the requirements collected with the experience acquired in several Exploitation Platform projects, such as the SuperSite Exploitation Platform (SSEP), the ESA Thematic Exploitation Platform and the ESA Mission Exploitation Platforms precursor projects. The architecture here described defines a set of platform components, their scope, functionalities and interfaces. Architecture proposed here is a general architecture which can be applied to a generic Exploitation Platform. Reference components implementations are listed in the Exploitation Platform Common Core Components document [EPCCC].

1.3

License

This architecture is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. In summary, you are free to:  

Share — copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms:   

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Notices:  

You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation. No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

To view a full copy of the license details, visit http://creativecommons.org/licenses/bysa/4.0/legalcode .

1.4

Structure of the Document

The present document broken down in the following major chapter:


1. Introduction: Document introduction, including terminology, acronyms and abbreviations 2. Architecture: Overview of the Exploitation Platform system Architecture, detailing components and components implementation requirements 3. End-to-End Scenarios: Map common Exploitation Platform operational usage scenarios to the involved architecture components

1.5

Acronyms and Abbreviations

Acronyms and abbreviations used in this document are reported in the following table. AAI

Authorization and Authentication Infrastructure

ACL

Access Control List

AOI

Area Of Interest

API

Application Programming Interface

ARQ

Architectural ReQuirement

CMS

Content Management System

CRUD

Create, Read, Update, Delete

DOI

Digital Object Identifier

DP

Data Provider

ECSS

European Cooperation on Space Standardization

ENVISAT

ENVIronmental SATellite

EO

Earth Observation

EP

Exploitation Platform

EPA

Exploitation Platform Architecture

ERS

Eureopean Remote Sensing

ESA

European Space Agency

FTP

File Transfer Protocol

FU

Final User

GIS

Geographical Information System

GUI

Graphical User Interface

HPC

High Performance Computing

HTC

High Throughput Computing


IDP

IDentity Provider

IETF

Internet Engineering Task Force

IPR

Intellectual Property Rights

OGC

Open Geospatial Consortium

OLA

Operations Level Agreement

OS

Operating System

OSS

Open Source Software

PDGS

Payload Data Ground Segment

PI

Principal Investigator

REQ

Requirement

RLS

Resource Location System

SSPE

SuperSite Exploitation Platform

TEP

Thematic Exploitation Platform

TPM

Third Party Mission

UAP

User Access Portal

URI

Uniform Resource Identifier

UUID

Universally Unique Identifier

VM

Virtual Machine

Table 1-1 List of acronyms and abbreviations

1.6

Applicable documents

[ECSSREQS]

Technical requirements specification ECSS-E-ST-10-06C

Table 1-2 List of applicable documents

1.7

Reference documents

[EPRCS]

Exploitation Platform Resource Catalogue Interface, DRAFT 1, 22.04.2016 Available at https://tep.eo.esa.int/documents/20181/30543/x/4124505a-b25a-4f4e-b202-5a3270232d24

[EPRIS]

Exploitation Platform Resource Ingestion Interface, DRAFT 1, 22.04.2016 Available at https://tep.eo.esa.int/documents/20181/30543/x/35acca85-0806-4a74-9405-651379d2283b


[EPRAS]

Exploitation Platform Resource Access Interface, DRAFT1, 22.04.2016 Available at https://tep.eo.esa.int/documents/20181/30543/x/35acca85-0806-4a74-9405-651379d2283b

[EPPES]

Exploitation Platform Processing Service Execution Interface, DRAFT1, 22.04.2016 Available at https://tep.eo.esa.int/documents/20181/30543/x/264eb32c-a8cc-4b91-8ef1-33de05b09eef

[EPPPS]

Exploitation Platform Processing Service Packaging, DRAFT1, 22.04.2016 Available at https://tep.eo.esa.int/documents/20181/30543/x/6cd9c352-84a4-4e3d-9fc7-bf54bf54fd12

[EPVEP]

Exploitation Platform Virtual Experiment Packaging, DRAFT1, 22.04.2016 Available at https://tep.eo.esa.int/documents/20181/30543/x/71fb0344-cf00-4bb4-95d8-273030da7dc5

[EPCCC]

Exploitation Platform Common Core Components, DRAFT 1, 22/04/2016 Available at https://tep.eo.esa.int/documents/20181/30543/x/eb905a75-c981-4298-816f-a32242c93b68

Table 1-3 List of reference documents


2

ARCHITECTURE

2.1

Terminology

Due to the particular concept of the Exploitation Platform, some of the words used in this document may have a particular meaning, slightly different from the common one. These and the new ones defined by this document are listed in Table 2-1. Within the table, multiple terms with the same definition are considered synonyms. Term

Definition

Final User (FU)

Consumer of the platform processing services and/or of the products stored and generated via the platform. Final user does not perform any integration or development work within the platform but may start processing based on already integrated Apps or Workflows. Final user usually interacts with the platform with a web-browser interface. Final user may be registered or anonymous users, according to the particular service permissions.

Scientific User Principal Investigator (PI) Service Provider (SP)

Expert user performing development and integration of algorithms (in forms of new Apps or Workflows) within the platform, to serve the final user.

Data Provider (DP)

User providing data to the platform. A Data Provider usually has IPR on the data, or represents the IPR data owner. DPs are the ones entitled to define data access permissions (e.g. download, processing). DPs have dedicated interfaces to setup such permissions and interact only which such interfaces.

Operator (OP)

Responsible for operational tasks on the platform, to ensure the platform correct functionality, manage users and their roles, support users in their usage of the platform. Operators have dedicated interfaces for management purposes.

Administrator

Responsible for the deployment of the platform (components installation, hardware and software configuration, etc…). The administrator tasks are not directly covered in this architecture document, as deployment considerations are issued into separate documentation.

Resource

Resource related to the execution of a Processing Service, can be Data (auxiliary or input), processing Results, App or Workflow.

Data

Generic term used in this document to represent both input, output and auxiliary data in use within the platform. The term Data refers to a set of measurements or to the result of an analysis performed on such


measurements. EO data

A particular set of data coming from an Earth Observation (EO) satellite measurement. This data has a set of standard metadata, such has measurement start time, stop time, spatial coverage, etc… and particular metadata dependent on the sensor that acquired the measurement. EO data is usually organised into Levels, starting from the Level 0 or RAW data (output of the satellite sensor), to Level 1 (calibrated and geo-located measurements), Level 2 and above (derived products). EO data is the usual input data in the Exploitation Platform processing.

Auxiliary data

Ancillary data used in the processing of analysis operation. Auxiliary data usually comes from different sources than the input data. The task to define, for a given input data, the auxiliary data to be used in the processing is named Orchestration.

ICT Resource

Hardware resource required for the deployment of an architecture component.

Publish

Act of ingestion of a Resource inside the platform.

Query

Abbreviation for resource query, search operation performed with the scope to list the resources satisfying a given set of search parameters (named Query parameters)

Result

Output of a processing performed via the platform. The term Result refers to a Result Data Product, if not otherwise specified (e.g. results collection or results metadata)

Product Granule

Representation of a resource within a file.

Collection Dataset series

Set of products. For data products, Collections usually group products belonging to the same satellite sensor, acquisition mode and processing level.

Metadata

Information associated to a given resource, which is relevant to the proper analysis of the resource itself.

Preview Quicklook

A representation of the resource in a synthetic form for quick visualization of the product contents.

Processing Service

A Processing Service, is a processing or analysis task offered by the Platform to the Final User. A Processing Service performs a set of operations that can be specified interactively by the user (in case of an App) or pre-defined according to given processing parameters (in case of a Workflow/Job).

App

Processing Service provided in form of an application directly instantiated in a Software-as-a-Service delivery model. App are usually


used for data analysis or other purposes. Differently from a Worklflow, an App is managed completely by the user, who can start it, stop it and perform interactively any operations the underlying software application allows. Workflow

Processing Service provided in form of a set of pre-defined operations. Workflows have pre-defined input and output data types, orchestration logic and scope of the processing, which may be customised by the user via a set of processing parameters. A workflow follows the Software-asa-Service delivery model but, differently from an App, a Workflow is not interactive and stops automatically when the pre-defined processing operations are complete.

Job

A pre-defined operation within a workflow.

Virtual Experiment

Reproducible representation of a processing activity, which includes information about the input data, auxiliary data, processing parameters and output data.

On-demand processing

Processing performed on-demand by the user over a limited set of data products. On-demand processing is different from systematic processing, which does not pose limits on the number or size of the input data to be processed. On-demand processing is the default processing services execution mode for the Final Users within the platform.

Systematic processing

Massive processing performed on large amount of data, above the limits of a single on-demand processing. Systematic processing can be performed in two sub-modes, bulk or near-real-time processing.

Bulk processing

Bulk processing is the on-shoot processing of large amount of data, above the limits of a single on-demand processing. Bulk processing is normally executed by splitting it in chunks which can be submitted as single on-demand processing executions.

Near-real-time processing

Near-real-time processing is a systematic processing performed on data as soon as it is available to the platform. Near-real-time processing is performed by launching on-demand processing via a trigger, which can start each time new data is available or at a given time each day, week or month.

Table 2-1 List of common terms

2.1.1

Requirements verbal form

Terminology of the requirements follows the “Required verbal form” in the ECSS Technical requirements specification standard [ECSS-REQS], for which the following requirements levels are defined: “shall”, “should”, “may” and “can”. 

“shall” is considered an imperative, which shall be taken into account.




“should” is considered a recommendation, and can be not met only if there is a strong rationale behind this choice and the implications of not taking into account this requirement are fully understood. Rationale can be specified in the requirement itself (e.g. via an “if” statement) or can come from the requirement implementation (e.g. incompatibility with another “shall” requirement of the particular thematic Exploitation Platform implementation). If this requirement cannot be met, a justification shall be provided at implementation level.



“may” is considered a permission, which may be implemented but it is not recommended. In this case, a justification shall be provided at implementation level for the need to implement this requirement.



“can” is considered a possibility or capability, which can be implemented but it is not mandatory. “can” is considered a light recommendation and no justification is required if the requirement is not implemented.

Each Architectural Requirement (ARQ) provided in this document has an unique identifier, according to the schema [ARQ XXXyyL], where XXX is a three-letter abbreviation of the component name to which the requirement relates to, yy is the number of the requirement, L is the requirement level, which can be S for shall, H for should, M for may and C for can classes of requirements.

2.1.2

Conventions

Regarding the figures present in this document, a set of conventions will be in use:     



Interfaces, components and connections related mainly to the final users instantiated operations and tasks are depicted in blue (e.g. getting data, submitting processing, getting processing results). Interfaces, components and connections related mainly to the data provider instantiated operations and tasks are depicted in red (e.g. publish of data, set of data access control lists). Interfaces, components and connections related mainly to the operator instantiated operations and tasks are depicted in purple (e.g. modify user authorization, retrieve usage logs). Interfaces, components and connections related mainly to the Principal Investigator instantiated operations and tasks are depicted in green (e.g. integration of new algorithms). An arrow depicts a flow of information from two components. An arrow usually refers to one or more standard interfaces which are invoked to perform the information exchange. The arrow direction follows the flow of information and it is not related to the flow of the request. A rectangular box constitutes a system component or the logical representation of a task. Rectangular boxes with underlying replicas represent components which may have more than one instance, with multiple deployments within the same platform.


2.2

General Overview

An Exploitation Platform is a complex environment, where several actors are involved with different roles and several functionalities are offered to the users, tailored to their specific role into the platform. These functionalities can be summarised as the following:          

“Data discovery”, which allows the final user to search for data products (input or published output) based on data metadata and custom search rules and optionally download them. “Data management”, for data provider to upload data into the platform, edit and delete it, setup data metadata, setup rights for data access and other data management related tasks. “On-demand processing”, which allow the final user to run a Workflows or Apps processing services. “Massive processing”, which allows the final user to perform bulk processing over large datasets or systematic processing upon new acquisitions. “Development and integration system”, for the Principal investigator to integrate his software into the platform. “Processing services management”, for the Principal Investigator to upload Apps and Workflows into the platform, manage versioning and metadata, setup rights for processing services access and other processing services management tasks. “Collaboration tools” and “documentation and support tools” for the operator to provide support to the other users. “Services marketplace”, for the final user to discover and compare all the platform services, from Processing services to data discovery and collaboration tools. “Single access portal”, as a single entry point where to access all the thematic platform services, related documentation and support. “Operator interface”, where the operator users can perform all the standard operation tasks, such as managing user, groups and permissions, configure the system and all its components, find information about the system configuration parameters and their possible values.

To have a simple view of the Exploitation Platform system, the Exploitation Platform Architecture is split among four macro-components. A macro-component is a logical collection of components that have similar or strictly related functions and implement a homogeneous set of Exploitation Platform services. A macro-component is just a logical representation used to easily describe the architecture, thus it has no direct relation to the system software implementations.


FU Ops

Query

User Access Portal

Metadata

Resource Management

DP

Query Results Execution request

Execution Environment

Workflow/App Workflow/App Data

Service Integration

PI

(test) Execution request Figure 2-1 Architecture macro-components general overview

A macro-component may exist separately from the others, offering its functionalities directly to the users. A macro-component implementation may cover only part of the macro-component functional interfaces and services, for example only the functions of interest of the particular Exploitation Platform. Also, a macro-component can be deployed in multiple instances, tailored to serve different communities and federated across different resource providers. The four macro-components implemented in the architecture, with a basic view of their relations, are depicted in Figure 2-1 and briefly described in the following paragraphs. A detailed description of each macro-component with its internal components and interfaces is provided in the following chapters.

2.2.1

User Access Portal

The User Access Portal macro-component provides the interface to the final users and the system operators. This macro-component implements also a set of common underlying services used by the other components, such as authentication, accounting, monitoring and collaboration tools. The User Access Portal macro-component implements the required “single access portal”, “collaboration tools”, “documentation and support tools”, “services marketplace” and “operator interface” functionalities. This macro-component connects to the Resource Management macro-component for getting information about the data, App and Workflow resources required for a given processing service and to the Execution Environment macro-component for executing processing services.

2.2.2 Resource Management The resource management macro-component handles the resources available in the platform. Its components implement the “data discovery”, “data management” and “processing services management” functionalities. The resource management macroPage 16/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

component takes care of the resource storage, resource location, resource catalogue, metadata harvest, resource visualization (for quick-visualization purposes) and all the other resource management tasks. The resource management macro-component takes as input a Query, which is the request for resource fulfilling different selection criteria, and returns the resource itself, in form of one or multiple data products, data results, App or Workflow packages. Together with the resource, the query returns the resource metadata, which usually include the preview associated to the resource for quick visualization purposes. This macro-component does not make any distinction between the kind of resources it stores, if not in the schema of the metadata associated to it. In some implementations of an Exploitation Platform, it can therefore store not only Data, Results, Workflows and Apps, but also publications, virtual experiments and other collaborative resources. The resource management macro-component considers also as input a Publish request, which is the request, performed by the Data Provider or the Principal Investigator, to insert one or more resource into the system (e.g. new data or new Workflows or new Apps).

2.2.3 Service Integration The Service Integration macro-component provides the PI with a framework to integrate his own application, algorithm and/or software into the platform as a new processing service (Workflow or App). This macro-component implements the required “development and integration system” functionality. The PI uses this macro-component to describe the application logic and package the application into a Workflow or an App. This macro-component does not include processing functionalities, but sends Execution Requests to the Execution Environment macro-component for testing and debug purposes.

2.2.4 Execution Environment The Execution Environment macro-component provides the platform with an environment to run processing services. This macro-component implements both the “on-demand processing” and “massive processing” functionalities. The Execution Environment macrocomponent takes as input an Execution Request, which can be for both a Workflow and an App processing. In case of a Workflow processing, the macro-component will perform the pre-defined processing operations according to the specified processing parameters, without further interaction with the Final User, then publish the Results into the Resource Management macro-component. In case of a an App processing, the Execution Environment will interact with the Final User during the entire processing, via the User Access Portal macrocomponent, for interactively performing both the processing operations and the publish of the results. This macro-component receives the Workflow, App and data required to run the processing from the Resource Management macro-component. The Execution Page 17/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

Environment macro-component can also perform Queries to the Resource Management macro-component to look for resources required by the processing which are not directly specified in the Execution Request.

2.3

Resource Management

As reported in the general overview, the resource management macro-component takes care of the overall resource (data, results, app and workflow) management tasks within the infrastructure. The ultimate scope of this component is to provide the final user and the other macro-components with a coherent access to the different resources used in the Exploitation Platform. Considering the variety of the resources required by an Exploitation Platform, which can be input data, auxiliary data, results, app and workflows and the spread of these over repositories with different underlying technologies, the key element in the design of this macro-component is flexibility. The macro-component, in fact, need to manage different resource formats, different resource locations and harmonise the different access policies and interfaces. The architecture proposed here for this macro-component is independent from the type of resource the component is going to serve. This does not imply that, in the particular Exploitation Platform deployment, a single generic resource management macrocomponent instance will be deployed, but different instances can be deployed separately to serve different homogeneous types of resources (data, results, workflows, apps). Figure 2-2 provides a view of this macro-component architecture, with the components involved in the Query (search for resource meeting a given set of search criteria) in blue and the ones involved in the publish task (addition of data or workflow into the platform) Metadata Query

Catalogue

Preview

PUT Metadata/ Preview/ Resource Location

Result/Data

Resource ingestion Workflow/ App

Resource Location

Result/ Data

External repository GET

PUT

Internal repository

Figure 2-2 Resource Management macro-component architecture


Resource Access Gateway Workflow/ App

in red/green. The main component of the resource management is the Catalogue. This is a database storing the information about the resources present into the infrastructure. The catalogue is generic and may serves EO data, auxiliary data, results, workflows or apps. These resources are usually represented with different metadata schemas. A resource is organised in collections, which represent a homogeneous set. The catalogue responds to a given search Query, returning the metadata for the resource elements fulfilling the query parameters. The Catalogue acts also as Resource Location System (RLS) and as Quick Visualization Tool, returning within the resource metadata representing respectively a location to retrieve the resource (resource location) and a preview of the resource itself. The preview is an optional metadata, usually implemented by the data and results classes of resource. The resource location is used to access the resource from a specific repository (which can be internal or external to the platform) via a Resource Access Gateway. This component takes care of file ACLs and the harmonization of the data access interface. The Catalogue is fed by the Resource ingestion component. This components gets Publish requests, extracts the metadata from the resource, generates the preview, push the resource into the storage and the metadata (including resource location and preview) into the Catalogue. The storage can be an internal storage, managed by the platform itself or an external storage. Interfaces for connection to external storage are not defined by the platform itself and internal repositories may also expose different interfaces, including offline and online access support. Thus, the resource access gateway acts as an abstraction layer, to retrieve the resource according to an interface compatible with the other components of the platform.

2.3.1

Catalogue

The catalogue is the most important component of the resource management system. There may be several implementations of a Catalogue component, anyway, to better understand the logic behind the catalogue, a sample architecture of a catalogue component is depicted in Figure 2-3. As you can see from the figure, the catalogue is constituted by a Database, which stores the metadata associated to the resources. A query engine handles the search of the resource according to specific user parameters. Resource metadata are custom and dependent on the particular collection and resource class. Anyway, some standard resource metadata is provided by the system, according to the schema of the resource class defined in the Catalogue interface. For example, for data resources, data products will have product size, geographical and temporal location metadata.


Query Engine

Query

Database

Metadata

Resource Location System

Quick Visualization

Preview

Resource Location resultsresultsMeta

Statistics generator PUT/DELETE

Management interface

Figure 2-3 Catalogue component sample architecture

A special metadata shared by all classes is the resource location, which is a reference to the resource into the repository. This metadata is handled by a Resource Location System, which produces the best location for accessing the resource according to the requester (e.g. source IP of the request, username of the request, custom query fields, etc…). Another special metadata is the resource preview, which is a representation of the resource contents in a standard format which is easy to be visualized by the final user. This metadata is handled by a Quick Visualization system, usually a web map server, which stores and produces the resource preview for the final user. The nature of the preview will depend on the class of resource. For data resources, it will be usually an RGB composite representation of the observation; for results, an image representing the results (e.g. graph or RGB composite); for App, a screenshot of the application GUI; for a Workflow, a mnemonic image associated to the pre-defined processing operations of the Workflow. The overall catalogue component is managed via a management interface, which takes care of all the management operations (put of new resources records, deletion or modifications of existing ones). The management interface provides also statistics about the resources available on the platform, generated via a dedicated Statistics generator system. In more details, the requirements of a catalogue component are: [ARQ CAT01S] Catalogue shall store internally information (geospatial and not) about resources present into the platform (e.g. via a dedicated database, with geospatial capabilities). [ARQ CAT02S] Catalogue shall be able to organize resources within collections. Resources of the same collection have the same metadata model and query parameters. [ARQ CAT03S] Catalogue shall be able to manage custom metadata models and custom query rules for the resources in a given collection. Page 20/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

[ARQ CAT04S] Resource metadata models and custom query rules shall be configurable for each collection by the Data Provider and Operator users. [ARQ CAT05C] Catalogue can support possibility for registered users to rate and comment entries. [ARQ CAT06S] Catalogue query interface and returned metadata shall be compliant to the standards described in the EP Resource Catalogue Interface [EPRCS] document, query interface section. [ARQ CAT07S] It shall be possible for the Operator to define additional custom collection metadata and additional custom collection query rules, beside the ones specified into the EP Resource Catalogue Interface [EPRCS] document, query interface section. [ARQ CAT08S] Catalogue shall contain a Resource Location System, which returns a reference for the resource to be used. [ARQ CAT09S] Catalogue resource metadata shall contain a reference to the RLS in form of an URI, according to the standards described in the EP Resource Catalogue Interface [EPRCS] document, query interface section. [ARQ CAT10S] RLS location resolution shall take into account the source IP of the query request and source client (user/agent) of the request, in order to report the best resource for the processing. [ARQ CAT11H] RSL location resolution should support custom rules to best resolve resource location for the processing pre-defined by the Operator. [ARQ CAT12S] Catalogue shall provide a resource preview Quick Visualization system, to display quick-looks and preview of the stored products to the final users. [ARQ CAT13S] Catalogue resource metadata shall provide a reference to the Preview in form of an URI, according to the standards described in the EP Resource Catalogue Interface [EPRCS] document, query interface section. [ARQ CAT14S] Catalogue resource preview shall be provided via the interfaces standards specified in the EP Resource Catalogue Interface [EPRCS] document, preview interface section. [ARQ CAT15S] Catalogue shall provide a management interface, able to perform CRUD operations on the managed resources and associated metadata. [ARQ CAT16S] Catalogue management interface shall be compliant to the standards described in the EP Resource Catalogue Interface [EPRCS] document, management interface section. [ARQ CAT17S] Catalogue shall support authorization at collection and resource level, restricting visualization of all metadata, visualization of a customizable set of metadata, visualization of single resource entries or listing of entries only to authorized users or groups. [ARQ CAT18S] Catalogue shall produce statistics about the resources present into the catalogue. Page 21/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

[ARQ CAT20H] Statistics interface should be compliant to the standards described in the EP Resource Catalogue Interface [EPRCS] document, statistics interface section. [ARQ CAT21C] Generated statistics can be displayed as per the previews, via the Quick Visualization sub-component and according to the standards described into the EP Resource Catalogue Interface [EPRCS] document, preview interface section.

2.3.2 Resource Ingestion The resource ingestion component takes care of the ingestion of a new resource into the platform. As per the catalogue, there may be several implementations of a resource ingestion component, depending also on the class type of the resource. Anyway, to better understand the logic behind the system, a sample data ingestion flow is depicted in Figure 2-4. The ingestion process starts with a publish request performed by the data provider, operator or final user. The publish request contains a set of information, such as the kind of resource (e.g. EO data, auxiliary data, workflow, app) and a reference to the resource to be ingested. At first, the publish request need to be authorized and the quota assigned to the user shall be verified. This is especially important in case of Results type of resources, which are generated by the processing services execution within the platform. Then, publish request passes a set of steps, each one implemented by a set of modules defined according to the information provided within the publish request (e.g. type of resource, resource collection, etc…). The first module implements the preparation step, which retrieves the reference resource in a common format (e.g. a set of files). This resource is passed to a metadata extractor which extract all the relevant metadata (e.g. package size, EO data metadata, App version, etc…). According to the metadata and the resource itself, the preview generator produces a quicklook of the resource for easier visualization by the final user. The resource format is then optionally converted in a format understandable by the platform (e.g. compressing the

Publish

Preview Generation

Quota and authorization

Preparation

Format Conversion

Storage PUT

Internal or external repository Page 22/68 Figure 2-4 Data ingestion component sample flow Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

Metadata Extraction

Catalogue PUT

Metadata/ Preview/ Resource location

resource) by the format conversion step and then passed to the storage put step which push the resource to an internal or external repository and return the resource location identifier. The resource location, together with the preview and the metadata for the resource is at the end pushed to the catalogue. In the flow described above, it is not required for the binary data corresponding to the resource to actually pass through the entire flow. This means, for example, that if the binary is already into a repository, the storage put step can just return its resource location without pushing the binary into the storage. Also, the preparation step can retrieve only information for generating metadata and preview and not the binary data itself. Each one of the blocks in the figure is implemented with a different user-selectable module. This approach permits the operator to setup the correct workflow for the particular resource to be ingested. The selection of the set of modules to apply is performed according to the publish data parameter (e.g. the resource type and the resource collection).In more details, the requirements of a resource ingestion component are: [ARQ RIG01S] It shall be possible for the user (DP or PI) to submit publish request for resources to be ingested into the Resource Management macro-component, according to the standards described into the EP Resource Ingestion Interface [EPRIS]. [ARQ RIG02S] The resource ingestion component shall retrieve the resource, extract metadata and produce preview, package the resource into a format understandable by the platform, store it into an external or internal repository and push the metadata, the preview and the resource location to the catalogue. [ARQ RIG03S] The resource ingestion shall perform authorization, permitting upload of data only to authorized users. [ARQ RIG04S] It shall be possible for the Operator within the resource ingestion component to define which user is authorized to access which resource and which resource collection. [ARQ RIG05S] The resource ingestion shall impose quota on resource usage to each user, in terms of a maximum amount of data and data size which can be ingested in total, per day, per month and per year. [ARQ RIG06S] Quota on resource usage imposed by the resource ingestion shall be configurable by the Operator for each user and each resource collection. [ARQ RIG07S] The resource ingestion shall have a modular approach, implementing the following modules: preparation, metadata extraction, preview generation, format conversion, storage put, catalogue put. [ARQ RIG08S] Ingestion modules shall be configurable by the Operator for the given resource collection specified in the publish request. [ARQ RIG09S] For each collection specified in the publish request, it shall be possible to specify a set of modules to be applied in a given order.


[ARQ RIG10S] Resource publish request shall be compliant to the standards described into the EP Resource Catalogue Interface [EPRCS] document, management interface. [ARQ RIG11C] Preparation modules can include a module to un-compress of data archive packages in TGZ, TAR, ZIP and RAR formats, download data from FTP, HTTPs and local repositories. [ARQ RIG12H] Extract Metadata modules should include modules for extracting relevant metadata according to the schemas defined in the EP Resource Catalogue Interface [EPRCS], query interface section. [ARQ RIG13S] Preview generation modules shall include support for generating previews in the formats specified in the EP Resource Catalogue Interface [EPRCS] document, preview interface section. [ARQ RIG14S] Format conversion modules shall include a module to convert resources binary data archive formats to standard data archive ones such as GZ, TGZ and ZIP formats [ARQ RIG15S] Storage put modules shall include modules to upload resource binary via FTP, WebDAV and GridFTP protocols. [ARQ RIG16S] Catalogue put modules shall include modules to push metadata, RLS and preview to the Catalogue according to the standards described into the EP Resource Catalogue Interface [EPRCS] document, management interface section. [ARQ RIG17S] It shall be possible to restrict access to resource ingestion only to authorized users (Data Providers or PIs), defined by the Operator. These users are considered owners of the uploaded resource. [ARQ RIG18H] It should be possible for the Operator to configure custom Terms and Conditions for resource ingestion, to be approved by the Data Provider or PI prior authorization to perform resource ingestion operations.

2.3.3 Internal Repository The system internal repository is a repository for binary representation of resources to be stored within the platform. This repository may have different storing technology, but it shall anyway satisfy the following requirements [ARQ INR01S] Internal repository shall support at least binary data objects represented in the form of ‘files’ (Products). [ARQ INR02C] Internal repository can support custom metadata associated to a file. [ARQ INR03C] Internal repository can support custom access control lists associated to a file. [ARQ INR04H] Internal repository should be private, e.g. resource stored should be accessible only via the resource access gateway component. [ARQ INR05H] Internal repository should provide the possibility to define redundancy at collection or file level, selectable by the Operator. Page 24/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

[ARQ INR06S] Internal repository shall support management operations via the FTP or WebDAV protocol. [ARQ INR07S] Internal repository shall export a single data product via the HTTP(s) protocol. [ARQ INR08S] Internal repository can export data via the FTP protocol. [ARQ INR08S] Internal repository can support retention policies, such as round robin, last used first, etc…

2.3.4 External Repository This component is a placeholder for external resource repositories. These repositories are not under direct control of the TEP platform, so they may implement access interfaces and AAI schemes not directly considered in this document. The system of modules in the resource ingestion component and resource access gateway shall ensure the correct management (for both PUT and GET operations) of the external repository. Beside the external repository particular access and authentication mechanisms, the exploitation platform shall support at least WebDAV, FTP and GridFTP repositories, as reported in the resource ingestion and resource access gateway requirements.

2.3.5 Resource Access Gateway The Resource access gateway takes care of managing binary resource data access operations within the platform. This component is an abstraction layer on top of the internal or external storage repositories, capable to serve the other platform components with a unique binary data access protocol and a compatible single authentication and authorization system. Resource access gateway shall follow the following requirements: [ARQ RGW01C] Resource access gateway can have different implementations, each one supporting a given internal or external repository, all sharing the same data output interface. [ARQ RGW02S] Resource access gateway implementations shall include an implementation to support FTP, WebDAV and GridFTP repository technologies. [ARQ RGW03S] Resource access gateway shall serve the platform with binary data access for download by the user or access for processing according to the standards described in the EP Resource Access Interface [EPRAS] document. [ARQ RGW04S] Resource access gateway implementations shall support: caching (local copy of the most used resources to enhance performances and reduce binary data transfer), mirroring (local copy of the entire repository) and remote access (retrieval of the binary data without storing it locally).


[ARQ RGW05S] Resource access gateway shall support authorization based on per-user and per-group collection and file ACLs. [ARQ RGW06H] Resource access gateway should integrate collection and file ACLs of the underlying repository if present. [ARQ RGW07S] Resource access gateway ACLs shall provide capability to limit access for processing only, download only or both. [ARQ RGW08S] Resource access gateway ACLs shall provide capability to force Terms and Conditions approval before allowing resource download and/or resource access for processing. [ARQ RGW09S] Resource access gateway ACLs shall be configurable via APIs, according to the standards described into the EP Resource Access Interface [EPRAS] document, management section. [ARQ RGW10S] Resource access gateway shall support quota on collections and at file level. [ARQ RGW11S] Resource access gateway quota shall allow the Data Provider to define a given monthly, yearly, weekly, daily or total maximum amount of binary data download and/or access for processing operations for a single user or a user group. [ARQ RGW12H] Resource access gateway should integrate per-user access quota of the underlying repository if present. [ARQ RGW13S] Resource access gateway quota shall be configurable via APIs, according to the EP Resource Access Interface [EPRAS] document, management section. [ARQ RGW14S] Resource access gateway shall provide per-user quota status information, according to the EP Resource Access Interface [EPRAS] document, quota section. [ARQ RGW15S] Resource access gateway shall provide costs for access to a given resource for download and access for processing. [ARQ RGW16S] Resource access gateway provided cost shall be associated to the single resource binary data with a pre-defined set of operations allowed (e.g. unlimited processing, 3 times downloads of the same data, etc…) [ARQ RGW17H] Resource access gateway should be capable to use collection and file cost information of the underlying repository, if present. [ARQ RGW18S] Resource access gateway shall provide costs via API, according to the EP Resource Access Interface [EPRAS] document, cost section. [ARQ RGW19S] Resource access gateway shall log and store accounting information about user download and/or access for processing to resources, including information about cost and quota usage, for Accounting and Monitoring component retrieval.


2.4

Service Integration

As reported in the general overview, the Service Integration macro-component makes possible to a PI to include its own software applications into the platform as new processing services (Workflow or App), so that they can be executed on the platform data and ICT resources by the PI itself or other users. Including a new scientific software within a platform is a task which can be performed following two major strategies. One is to re-develop the algorithm software into the platform itself. This process is usually referred to as migration and foresees a partial or complete rewrite of the application in the programming languages supported by the platform. The second approach is the porting, where an application developed in a different platform (e.g. a normal PC) is ported to the Exploitation Platform. Migration usually brings better performances than porting, since re-developing an application within a platform allows to use consistently all the platform capabilities (parallelization, array processing, etc…). Anyway, the effort to partially or totally rewrite the application, tough dependent on the application complexity, is generally higher than the effort to perform porting. In this in this architecture, our design choices are driven by the scope to minimize the integration effort. Thus, we will consider for the Service Integration macro-component the porting as the main integration strategy. Anyway, to maximize the optimization of the code, some side components supporting migration will be still considered. The results will be then a mixed approach, where the majority of the code is ported to the platform, while some small part which requires high optimization are re-developed. Before describing the details of this macro-component, it is important to understand the difference between Workflow and App processing services and the consequences of this into the integration process. As specified in the terminology, a Workflow envisages a set of pre-defined operations, pre-defined input and output data types for each operation, predefined orchestration logic and scope of the processing, which may be customised by the user via a set of pre-defined processing parameters. A Workflow processing is usually broken into different steps, chained together to form a graph. Each Workflow processing operation step, usually referred as job, includes the application software packages, the application software dependencies, descriptions of the step input, output and parameters, the information on how and where the processing can be executed (e.g. processing parallelizable over a Linux OS computing cluster) and any other relevant parameters. The job input, output and parameters are derived from the overall Workflow via a process named orchestration, where different jobs are executed in a given sequence to perform a more complex task. The Service Integration component provides the PI with an environment to define the jobs, the functional graph connecting the jobs and the orchestration. Moreover, it includes a function to test the Workflow entirely or step-by-step, tools to produce and link documentation about the Workflow usage and package the Workflow to be published as a new processing service.


An App, is, instead, an activity which have no pre-defined set of operations to be performed, no pre-defined input or output and no pre-defined scope for the processing. The App assumes that processing operations are provided via a direct user interaction. An App is thus left in complete control of the final user, which will use it as per an application software running on his own laptop. Instead of software running on a laptop, anyway, an App runs on the execution environment, thus it is able to access directly the data present in the infrastructure, it is accessible from any computer connected to the internet and consumes CPU and memory resources from the infrastructure. The Service Integration macro-component provides the tools to package a software application in a format which can be provisioned by the system on-demand, with all its dependencies, GUI, usage documentations, required auxiliary data, etc… Even if an App and a Workflow are different ways to perform processing activities, within this architecture they are considered two faces of the same coin. That is why, as you can see from Figure 2-5, the Service Integration macro-component considers a single interface for the PI to integrate its application into the platform. This interface is provided by the DevBox component. The DevBox component will offer to the PI an environment to perform porting of the application into the platform. The environment is constituted by a set of tools, compilers, libraries and SDKs required to adapt the software for execution within the platform. Alternatively to the DevBox, the PI can use the Interactive Development component, which is focused to allow migration of part of the software application code into the platform. Respect to the DevBox, the Interactive Development component offers a more restricted environment, with a fixed set of allowed programming languages and libraries offered via a web IDE. In exchange of the harder The Test/Debug Tool instead provide both the DevBox and the Interactive Development environment a wrapper to a test Execution Environment which can be used by the PI to emulate the running of the application within the infrastructure. This tool is able to issue test execution requests to the execution environment and return debug information from the execution environment. As you can see from the figure, both the DevBox and the Interactive Development components have no connection with the Resource Management macro-component for data resource access. The input or auxiliary data resources for the processing are, in fact, not fetched directly into the DevBox or Interactive Development, but accessed directly by the Execution Environment upon test processing service execution. Another important component is the Packaging Tool, which takes the task to extract all the information specified by the PI in the DevBox or Interactive Development component, build the Workflow and App packages and push them to the Resource Management macrocomponent to be stored into the platform and available for the processing services execution. This tool works in pair with the Import Tool, which performs the opposite job, gathering components from the Resource Management repositories and importing them into the DevBox, to be used, modified and ultimately re-integrated within the platform. It shall be Page 28/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

noted that the Import Tool cannot import software applications into the Interactive Development environment, since the Interactive Development environment requires as input the software source code, which is not stored in the App or Workflow package. Via the combined use of the Packaging Tool and the import tool, it is possible to perform a mixed integration strategy, migrating part of the software via the Interactive Development component, packaging it and importing it into the DevBox to be combined with the porting of the other parts of the software. This advanced integration strategy can allow the exploit of the optimization capabilities offered by the Interactive Development environment without losing completely flexibility of integration of software applications. As final note, in order to provide a stable environment for the integration of an application, the Service Integration macro-component is going to be deployed several times, with Workflow/ App

Import Tool

PI

DevBox

Interactive Development

Packaging Tool

Test/Debug Tool

Workflow/App

Execution Request

Figure 2-5 Service Integration macro-component architecture

instances dedicated to a particular App or Workflow to be integrated. That is why the DevBox and all the other components are usually packaged in the form of a Virtual Machine, which can be deployed within the platform or directly at the user premises (e.g. on the user PC).

2.4.1

DevBox

The DevBox is the main component of the Service Integration. This component aims to provide a full, application-agnostic environment for the porting of a generic application into the platform. The DevBox component shall meet the following requirements: [ARQ DBX01H] DevBox should allow the porting of software regardless which is the particular application logic, programming language, dependencies, etc… [ARQ DBX02S] DevBox shall allow the PI to compile and port applications running on Linux-based OSes.


[ARQ DBX03C] DevBox can allow the PI to compile and port applications running on other not-Linux-based OSes. [ARQ DBX04S] DevBox shall mimic the Execution Environment, in the form of sharing the same OS, packages and environment configuration, in order to simplify the porting process. [ARQ DBX05S] It shall be possible to specify, via the DevBox environment, the information about the required input data, auxiliary data, processing parameters, dependencies, libraries and output data for every processing step to be packaged inside a Workflow. [ARQ DBX06S] Information about the input and auxiliary data resources for the Workflow should include the type of data allowed (e.g. EO data, published results, output results of other jobs/workflows) and the compatible collections (e.g. EO data belonging to a given collection, published results obtained as output from a given process) [ARQ DBX07S] It shall be possible to specify, via the DevBox environment, the information about the required dependencies, libraries and access GUI for every software which wants to be packaged inside an App. [ARQ DBX08S] Workflow and App information definition shall follow the standards described into the EP Processing Service Packaging [EPPPS] document. [ARQ DBX09S] It should be possible to package the DevBox system, together with the other Application Integration components, into an instance (e.g. a VM) capable to run offline (e.g. on the PI PC) or online (e.g. via SSH or custom Web Browser interface) [ARQ DBX10S] DevBox should integrate collaborative development tools provided by the Support Tools

2.4.2 Interactive Development The Interactive Development is a secondary component of the Service Integration which allows migration of code for more advanced PIs who wants to optimize their code for running into the platform. This component aims to provide a web IDE, supporting a limited set of programming languages and libraries, where the PI can write the source code for its software application. The Interactive Development component shall meet the following requirements: [ARQ IDD01S] Interactive Development shall provide an interactive web Integrated Development Environment, available via a generic web-browser. [ARQ IDD02S] PI shall be able to input code into the Interactive Development environment according to a pre-defined set of programming languages [ARQ IDD03H] Pre-defined set of programming languages should include, at least, Python and Java.


[ARQ IDD04H] Interactive Development should allow interactive development and testing of the application for debugging, allowing the PI to run one line of the code at the time. [ARQ IDD05S] It shall be possible to specify, via the Interactive Development environment, the information about the required input data, auxiliary data and processing parameters to be transmitted to the Packaging Tools.

2.4.3 Packaging Tool The Packaging tool extracts the information on the ported software from the DevBox or the Interactive Development environments, package the software into a Workflow or an App and uploads the packages into the Resource Management, for testing or other usage. The Packaging tool shall support the following requirements: [ARQ PGT01S] The packaging tool shall produce Workflow packages according to the standards described into EP Processing Service Packaging [EPPPS] document, for noninteractive processing services packaging. [ARQ PGT02S] The packaging tool shall produce Application packages according to the standards described into the EP Processing Service Packaging [EPPPS] document for interactive processing services packaging. [ARQ PGT03S] The packaging tool shall support versioning and branches for the built packages. [ARQ PGT03S] The packaging tool shall, by default, build and upload packages as test versions, allowing the user to publish final versions only explicitly. [ARQ PGT04S] The packaging tool shall compile, if required, the software application source code and include or link all the software application dependencies within the output package for a correct execution within the platform. [ARQ PGT05S] The packaging tool shall be able to register App and Workflow packages into the Catalogue component according to the standards described into the EP Resource Catalogue Interface [EPRCS] document, management interface section. [ARQ PGT06S] The packaging tool shall be able to specify, for each of the packaged versions, on PI request, the permissions for package download and processing to the public, a given user or a given group, to the Resource Access Gateway component, via the component provided APIs.

2.4.4 Import Tool The Import Tool objective is to provide the PI with the possibility to include software and processing developed by other users within its App or Workflow. This can be achieved in two ways, one is linking, the other one is editing. Linking is considered the action to link the App as a dependency in a new App definition or link the Workflow as a processing step in a new Workflow definition. In this case, the PI


does not need to have permission for package download, but only for package access for processing. The edit function, instead, requires the PI to be authorized to download the App or Workflow package, which will be extracted by the Import Tool and can be modified by the PI and included in his own processing service. The Import Tool is particularly useful to improve collaboration, giving the possibility to the PI to reuse work performed by other scientists and access examples of already integrated applications which can ease the PI integration work. The Import tool shall support the following requirements: [ARQ IMT02S] The import tool shall provide the PI with the possibility to link existing Workflows as processing steps into his integrated Workflow. [ARQ IMT03S] The import tool shall provide the PI with the possibility to download Workflow packages produced according to the EP Processing Service Packaging [EPPPS] document from the Resource Management macro-component. [ARQ IMT04S] The import tool shall provide the PI with the possibility to extract the contents of Workflow packages produced according to the EP Processing Service Packaging Standard [EPPPS], including the Workflow logic, all the workflow steps and all the related information into the DevBox. [ARQ IMT05S] The import tool shall provide the PI with the possibility to link existing App as dependency into his integrated App. [ARQ IMT06S] The import tool shall provide the PI with the possibility import App packages produced according to the EP Processing Service Packaging [EPPPS] document, extracting the packaged applications, logic and all the related information into the DevBox. [ARQ IMT07S] The PI shall be able to specify to the import tool the package to be linked or edited as a direct reference to the package obtained from the Catalogue, the Marketplace or the Collaboration Bucket components.

2.4.5 Test/Debug Tool The Test/Debug Tool task is to aid the test and debug of the software, so that the user can verify the successful porting of the processing service. The test tool performs Execution Requests to the Execution Environment, in order to execute the software. These requests shall be as close as possible to the normal requests performed via the User Access Portal. In order to execute a processing service into the Execution Environment, the package needs to be uploaded to such environment. Thus, this component usually calls directly the Packaging Tool to upload the application to the Execution Environment before executing the test. The Test/Debug tool submitted execution requests are based on the definition of standard test packages. These test packages include information on how to run the application, which input product to use, which parameters to run and which is the expected output. The Test/Debug tool may provide a simple interface to build these tests.


In order to do not interfere or disrupt the normal Execution Environment operations, in a common deployment of an Exploitation Platform, this component submits Execution Request to a test deployment of the Resource Management. This is, in any case, an operational choice performed by the Administrator during deployment, since there shall be no difference, from the interface point of view, between test and operational Execution Environment. The Test/Debug Tool shall meet the following requirements: [ARQ TDT01S] The Test/Debug tool shall submit processing services Execution Requests to the Execution Environment. [ARQ TDT01S] It shall be possible, via the Test/Debug tool, to setup all the parameters and input of the processing. [ARQ TDT01S] It shall be possible, via the Test/Debug tool, to define the expected output of a test in the form of a custom test function. [ARQ TDT01S] It shall be possible, via the Test/Debug tool, to save a test into a test package, generated according to the standards described into the EP Virtual Experiment Packaging [EPVEP] document. [ARQ TDT01S] The Test/Debug tool shall call the Packaging Tool, prior creating an Execution Requests, in order to be sure that the latest test version of the integrated software is uploaded. [ARQ TDT01S] The Execution Environment instance to be used by default by the Test/Debug tool shall be configurable by the Operator user.

2.5


The Execution Environment macro-component provides a shared environment to run processing services within the platform. This environment is capable both to run Workflow or App processing services. The Execution Environment, usually, is not directly accessed by the Final User, the PI or the Data Provider, but only by the Operator (for its configuration and management) and his interfaces are only internal interfaces to the other macrocomponents. The Execution Environment is the more demanding of the macro-component in terms of ICT resources, since it performs all the computing tasks within the platform. Also, considering that the Service Integration component design prefers easier application integration over application performances, the ICT resources needed by this component to perform some requested processing activity may be quite high. For this reason, the design of this component has been driven in order to minimize the costs of the underlying ICT infrastructure, optimizing its usage via ICT resource sharing and flexible ICT resource provisioning. A representation of the Execution Environment macro-component is reported in Figure 2-6.


The core of the environment is the Execution Cluster. This component represents the infrastructure where the processing services will execute. The Execution Cluster is not a Computing Cluster in the classical meaning of the word (a set of tightly connected computers), but it is a Cluster in its wider concept as a set of ICT resources, so a set of memory, CPU, GPU, Network, Disk and other resources (physical, cloud, HPC, HTC, etc…), which are tightly connected and can be seen as an unique ICT resource pool for running a

Workflow

Workflow Manager

Query

Results

Data Execution Request

Execution Gateway

Execution Cluster

Capacity Manager

App

App Manager

Query

Results

Data

computing activity. The Execution Cluster resource pool is not fixed, but can be reduced or grow in time. This variation follows dynamic provisioning logics (i.e. in a Cloud environment), which are implemented by a dedicated component, which is the Capacity Manager component. In its abstract representation, an Execution Cluster is capable to run a generic processing service, which can be a Workflow or an App. Two different components, named Workflow Manager and App Manager, manage respectively the execution of Workflows and Apps. The Execution Environment will need the Workflow and App packages provided by the resource management macro-component for the processing services. These packages are deployed on the Execution Cluster by the Workflow Manager and App Manager. It is therefore an additional task of the Workflow Manager and App Manager to be sure the packages are correctly deployed in the execution cluster and overhead for packages deployment is minimised. In order to perform the computing activity, the Execution Cluster needs access to the input and auxiliary managed by the Resource Management macro-component. This access Figure 2-6data Execution Environment macro-component architecture is provided via the Workflow and App Manager functionalities, which bridge the Resource Management APIs to the Execution Cluster technologies (e.g. represent the data as a virtual file system for the Workflows and Apps to access).


The user Execution Request is managed by the Execution Gateway, which is the primary front-end of the Execution Environment. This component analyses the request and forwards it to the Workflow Manager or App Manager, according to the type of processing service requested. Also, since the Execution Cluster may not be a single entity but can be constituted by multiple different Executions Clusters implementing different technologies (i.e. Cloud, HPC, HTC, etc…), the Execution Gateway, will also decide on which Execution Cluster to run the activity, according to Cluster technology, usage, cost, data access and other policies.

2.5.1

Execution Cluster

As explained in the previous paragraph, the Execution Cluster is the resource pool for the processing services execution within the platform. This pool may be implemented in different technologies (i.e. Cloud, HPC, HTC, etc…), with the scope to aggregate as more resources as possible, and thus minimize the cost of running the processing within the infrastructure. Anyway, the choice of a given technology is not entirely free, since the Execution Cluster shall satisfy the following requirements: [ARQ ECS01S] The Execution Cluster shall be able to support execution of standard Linux command-line applications, executed with limited user permissions. [ARQ ECS02S] The Execution Cluster shall support execution on different Linux distribution and base OSes. [ARQ ECS03H] It should be possible for the Operator to define custom base OSes for the applications executions. [ARQ ECS04C] The Execution Cluster can support custom allocation of resources, in terms of memory, CPU, disk or network, for a given execution. [ARQ ECS05C] The Execution Cluster can support pre-allocation of resources, in terms of memory, CPU, disk or network, for a given execution, dedicated to a given user, group or processing service. [ARQ ECS06H] The Execution Cluster should support an interface to provide status and usage information to the other components via a given API. [ARQ ECS07H] The Execution Cluster should be able to scale the resource pool via a given API, to be used by the Capacity Manager component. [ARQ ECS08S] The Execution Cluster shall support processing services execution which duration is not limited in time. [ARQ ECS09H] The Execution Cluster should support the possibility for an Operator to set overall limit of duration of processing services execution. [ARQ ECS10H] The Execution Cluster should be able to block a processing service execution on request from the Workflow Manager or App Manager and release the reserved resources.


[ARQ ECS11C] The Execution Cluster can be able to pause activity processing service execution on request of the Workflow Manager or App Manager, releasing partially or totally the reserved resources, and then resume it later.

2.5.2 Capacity Manager The Capacity Manager component is a component tightly connected to the Execution Cluster. The task of this component is to vary the cluster size accordingly to a set of policies. The Capacity Manager is very important to minimise the cost of the infrastructure for Environment Cluster technologies who support dynamic scaling of ICT resources (e.g. Cloud environments). The Capacity Manager component shall implement the following requirements: [ARQ CMG01S] The Capacity Manager component shall increase or decrease the size of the Execution Cluster resource pool accordingly to a set of configured policies. [ARQ CMG02S] The Capacity Manager component shall support scaling policies based on: resources usage, resources cost, minimum and maximum resources. [ARQ CMG03S] It shall be possible for the Operator to configure custom policies within the Capacity Manager system in the form of custom policy modules. [ARQ CMG04S] It shall be possible for the Operator to trigger manually an increase or decrease of the cluster via the Capacity Manager component. [ARQ CMG05S] The Capacity Manager component shall support multiple Execution Clusters APIs via a set of modules.

2.5.3 Workflow Manager The Workflow Manager component manages the execution of the processing operations defined within the Workflow, here referred as jobs. Therefore, the Workflow Manager takes care of the setup of the environment, the invocation of the job executable, the restart of a job if the processing fails, the gathering of the logs and output of each job and the entire workflow and all the other workflow management related operations (e.g. pause of a workflow, migration of a workflow from one execution environment to another, restart of the workflow, etc…). Considering the nature of the workflows involved in the Exploitation Platform, the Workflow Manager considered in this architecture is what is normally referred to as a Scientific Workflow Management system. This kind of workflow management systems, have an approach slightly different from the normal Business Management workflows approach, being more focused on the execution of heavy computational tasks and multiple data manipulation steps than just the definition and monitor of a pre-defined set of Query

Data

Job Job Execution Request

Job Execution Job Output

Page 36/68 Workflow

Job Logs

Job Monitor Logs

Workflow Exploitation Platform Open Architecture Engine Issue DRAFT Rev 3

Date 22/04/2016 Execution

Request Figure 2-7 Workflow Manager component sample conceptual schema

Results

operations. There are several implementation of a Scientific Workflow Management system, anyway, the one to be considered in this architecture can be schematized as reported in Figure 2-7. In this schema, the Workflow Manager is constituted by a Workflow Engine. This component analyses the Workflow and the execution request, splits it in its Jobs and sends the jobs to the execution cluster via the Job Execution component. The Job Execution will resolve required input and auxiliary data, gather it from the Resource Management macro-component, deploy the Job software in the Execution Cluster and run the Job software. After a job is completed, the Workflow Engine will gather the job output and use it as input for the next jobs. At the end of the workflow, the final processing results are pushed to the Resource Management macro-component to be then accessed by the final user. Another sub-component of the Workflow Manager represented in the figure is the Job Monitor, which gathers the job logs and monitors the correct execution of the Jobs. This sub-component aggregates the job logs into the final execution logs of the workflow for the operator to check the correct execution of the entire workflow. The Job Monitor is an active monitor, so it can take actions, interrupting, modifying or restarting the workflow if an error in the execution is detected. At the end of the processing, the final workflow logs are sent back by the workflow manager in the response to the Execution Request or packaged into the results. The Workflow Manager may run the different Jobs constituting the workflow in parallel or sequentially, according to the graph defined within the Workflow and the availability of the input. Moreover, the single Job can be executed in the Execution Cluster using a variable set of resource pool, involving several nodes of the cluster in parallel. This, heavy parallelized approach permit to improve efficiency and resource usage. The Job Execution is performed according to the flow specified in Figure 2-8.The first step in the job execution is the Input Resolution. This step will get the input data list of the particular job from the execution request, resolve required auxiliary data via queries to the Resource Management macro-component, aggregates the inputs and the auxiliary data into parallel job executions to be run on the Execution Cluster. At the same time, the environment for the parallel job executions is created into the Job Execution Request

Query

Input Resolution

Job

Page 37/68

Environment Preparation

Data

Execution

Output/Log collection

Execution Cluster

Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

Figure 2-8 Workflow Manager component Job Execution sample flow

Job Output Job Logs

Execution Cluster by the Environment Preparation step. The Environment Preparation step deploys the job and its dependencies (if not already deployed) and configures it according to the job parameters specified in the job Execution Request. The next step is the actual Execution of the job on the execution cluster, which will access the input and auxiliary data from the Resource Management macro-component. This Execution is done in parallel, via array processing, by running multiple instances of the same job on the Execution Cluster with different inputs. In the end, the Output/Log collection step gathers the parallel Jobs output and the Job Logs to be sent to the Workflow Engine and the Job Monitor. An important point to note in the figures is that the Workflow software application is never interfacing directly with the Resource Management macro-component, since the gathering of the input data, the resolution of the auxiliary data and the publish of the results is done by the Workflow Manager. This is required to have Workflow software application packages which do not depend on the particular interfaces of the Resource Management macro-component. The overall Workflow Manager, as described above, need to comply with the following requirements: [ARQ WMG01S] Workflow Manager shall be able to manage the execution of Workflows as produced by the Service Integration macro-component, according to the standards described into the EP Processing Service Packaging [EPPPS] document for non-interactive processing services packaging. [ARQ WMG02S] Workflow Manager shall execute a Workflow, return the Processing Logs and publish the processing output results to the Resource Management macrocomponent upon successful job execution. [ARQ WMG03S] Workflow Manager shall deploy the Workflow software application on the Execution Cluster. [ARQ WMG04H] Deployment of Workflow software applications into the Execution Environment should be done once for every new software requested, to minimise the overhead of the deployment. [ARQ WMG05S] Workflow Manager shall be able to monitor the execution of all the Wokflow jobs in order to spot errors in the processing. [ARQ WMG06S] Monitor functionality of the Workflow Manager shall be able to perform corrective actions if errors are detected during the processing, such as interrupt the overall processing or restart the entire processing or part of it. [ARQ WMG07S] Monitor functionality corrective actions of the Workflow Manager shall be configurable by the Operator. [ARQ WMG08S] Workflow Manager shall support the possibility to split jobs into parallel execution according to an array processing paradigm.


[ARQ WMG09S] Workflow Manager shall be able to resolve job input auxiliary data according to the policies specified in the Workflow, via the interfaces provided by the Resource Management macro-component. [ARQ WMG10S] Workflow Manager shall provide job input and auxiliary data to the job software using the interfaces provided by the Resource Management macro-component and provide it to the job software for the Execution. [ARQ WMG11S] Workflow Manager shall provide job input and auxiliary data to the job software via a virtual file system access. [ARQ WMG12S] Workflow Manager shall provide the job software with a command-line tool to perform independent queries to the Resource Management catalogue and independent download of additional resources, which can be integrated in the Workflow software. [ARQ WMG13S] Workflow Manager shall publish the final output of a Workflow to the Resource Management macro-component. [ARQ WMG14C] Workflow Manager can provide command-line tools to perform independent publish of results to the Resource Management macro-component.

2.5.4 App Manager The App Manager takes care of the execution of an App into the Execution Cluster. As detailed in the introduction, an App is different from a Workflow, since it has no predefined input, no fixed execution time or expected output data. Figure 2-9 depicts a sample scheme of an App Manager component. The first component parses the Execution Request and the App package and prepare the App environment on the Execution Cluster. This Environment Preparation resolves the App dependencies, deploys the software into the Execution Cluster and configures it accordingly to the Execution Request. The next step is the actual execution of the software, with the access to the Data from the Resource Management macro-component. The Execution will generate a set of Execution Request App

Query

Environment Preparation

Data

Execution

Execution Cluster

FU

App Access Gateway Logs

Figure 2-9 App Manager component sample conceptual scheme


Results

App Monitor

instantiation logs, depicting the status of the processing (e.g. the amount of resources in use, the correct execution of the software), which may be checked by the Operator to ensure the correct work of the App. The most important component in this scheme is the App Access Gateway. This takes care of the interaction with the Final User, who need to drive the processing performed via the App. If the App software already provides a web service access to the App itself, this will be just forwarded to the final user. If the App software does not provide a web service access, which is the most common case for software originally designed to run on a desktop environment, the App Access Gateway will virtualise the software GUI and provide it via web interface to the Final User for the interactive definition of the processing operations. Since the activities performed by an App are dependent on a direct iteration of the App with the final users, the App may be, for some time, not performing any activity, but still consuming system resources. For this reason, the App Monitor sub-component is present in the App Manager scheme. This sub-component monitors the usage of the App and put it in pause (releasing the resources reserved by the App) if it is not used according to a set of policies defined by the Operator. An important point to note in the figure is that the App software application is never interfacing directly with the Resource Management macro-component, since the gathering of the input data, the query to the Resource Management catalogue and the publish of the results is done via the App Manager. This is required to have App packages and App software independent on the particular interfaces of the Resource Management macrocomponent. The App Manager shall comply with the following requirements: [ARQ APP01S] App Manager shall be able to manage the execution of Apps as produced by the Service Integration macro-component, according to the standards described into the EP Processing Service Packaging [EPPPS] document for interactive processing services packaging. [ARQ APP02S] App Manager shall support request for start of a new App, as the running of the App and its dependencies on the Execution Cluster. [ARQ APP03S] App Manager shall support request for stop of a running App, as the end of the running of the App on the Execution Cluster and the release of the App resources. [ARQ APP04S] App Manager should support request for pause of an App, as the partial release of the App resources (e.g. only memory and CPU), maintaining the App status until a resume request is performed by the user. [ARQ APP05H] App Manager should support requests for modification of the App resources, for Apps in pause state. [ARQ APP06S] App Manager shall provide access to the input and auxiliary data resources to the App software using the interfaces provided by the Resource Management macro-component.


[ARQ APP07S] App Manager shall provide access to the input and auxiliary data to the App software via a virtual file system. [ARQ APP08S] App Manager shall provide the App software with a command-line tool to perform independent queries to the Resource Management catalogue and independent download of additional resources. [ARQ APP09S] App Manager shall publish the output of the App processing (e.g. provided by the user by saving output to a results folder via the App software) to the Resource Management macro-component upon App stop. [ARQ APP10S] App Manager shall provide command-line tools to perform independent publish of results to the Resource Management macro-component, which can be integrated in the App software. [ARQ APP11S] App Manager shall provide an App Access Gateway service to provide a web interface to the Final User for interactive execution of the App software processing operations. [ARQ APP12S] App Access Gateway shall integrate web-browser emulated VNC server standard for GUI applications interaction with the final user. [ARQ APP13H] App Access Gateway should integrate web-browser emulated X11 server for GUI applications interaction with the final user. [ARQ APP14S] App Access Gateway shall integrate web browser emulated Linux terminal access for command line applications interaction with the final user. [ARQ APP15H] App Manager should monitor the execution of the App, being able to change the state of an app (pause, resume or stop an App) according of a set of a custom policies defined by the Operator.

2.5.5

Execution Gateway

The Execution Gateway acts as main interface of the Execution Environment to the other macro-components and, ultimately, the final user. The Execution Gateway task is to provide a single, standard compliant, interface to submit activities to the Execution Environment, regardless which is the nature of the activity and the kind of Manager (App or Workflow) or Execution Cluster required to run it. The Execution Gateway has also the task to optimize the usage of the infrastructure, selecting the best Workflow Manager, App Manager or Execution Cluster to run a given application in view of performance and cost-saving considerations (e.g. the Execution Cluster closer to the input data required by the application or the Execution Cluster currently under-loaded). [ARQ EXG01S] Execution Gateway shall dispatch the Execution Request to a given App Manager or Workflow Manager according to the type of request (i.e. if request is for a Workflow or an App)


[ARQ EXG02S] Execution Gateway shall dispatch the Execution Request to a given App Manager or Workflow Manager, taking into account the served Execution Cluster according to a set of pre-defined policies, based on current Execution Cluster load, availability of the required ICT and other resources and required data I/O performances. [ARQ EXG03S] Execution Gateway pre-defined policies for dispatch to a given Execution Cluster shall be configurable by the Operator. [ARQ EXG04S] Execution Gateway shall provide an interface for submission of the Execution Request according to the standards described into the EP Processing Service Execution Interface [EPPES] document. [ARQ EXG05S] Execution Gateway shall support dispatch to multiple App Manager and Workflow Manager implementations via a modular approach. [ARQ EXG06S] Execution Gateway shall support authorization, with the possibility to restrict user usage of an Execution Cluster to given users and groups. [ARQ EXG07S] Execution Gateway shall support per-user quota on total resource concurrent usage, concurrent workflow or app execution, total resource used per month, day and year, total workflow or app executions per month, day and year. [ARQ EXG08S] It shall be possible for the Operator to configure a list of processing services allowed to run into given Execution Cluster via the Execution Gateway, with reference to the App/Workflow package from the Resource Management macrocomponent. [ARQ EXG09S] Execution Gateway shall provide, per user, list of the processing service available on the Execution Environment, according to the standards described into the EP Processing Service Execution Interface [EPPES] document. [ARQ EXG10S] Execution Gateway shall provide, per user, list of the dataset allowed to be used with a given processing service, according to the standards described into the EP Processing Service Execution Interface [EPPES] document. [ARQ EXG11S] Execution Gateway shall provide, per user, quota status information about the usage of each processing service, according to the standards described into the EP Processing Service Execution Interface [EPPES] document, quota section. [ARQ EXG12S] Execution Gateway shall verify, prior forwarding the Execution Request, that the user has the required access for processing to all the resources required for the execution of the processing service. [ARQ EXG13H] Execution Gateway should provide an interface to simulate the execution of an App or Workflow, returning the cost estimation for a given processing according to the standards described into the EP Processing Service Execution Interface [EPPES] document, processing simulation section. [ARQ EXG14S] Execution Gateway shall determine the total cost for the processing at the end of the processing.


[ARQ EXG15S] Execution Gateway shall log and store per-user Execution Requests, including details about consumed resources, cost, processing execution time, processing service package used and execution cluster information, for Account and Monitor component retrieval.

2.6

User Access Portal

The User Access Portal macro-component is the main interface of the system to the Final User and the Operator users. The main task of this component is to provide an easy-to-use interface to the platform functionalities, which can be tailored to the particular user community needs. The UAP macro-component offers also a set of common underlying components, which scope is to harmonize the user management and provide an easier management of the platform. As shown in Figure 2-10, the Final User facing components of the User Access Portal are the Marketplace, which provides search and discovery of the platform services, including data and collaboration tools, the Workspace, which provides an environment for using the platform processing services, the Collaboration Bucket, which permits sharing of processing services and results created within or outside the platform and the Support Tools, which provide support to the Final User for the proper usage of the platform. Workspace, Collaboration Bucket, Marketplace and other components may integrate internally functionalities provided by the Geo Resource Browser. This component is essentially a client to the geographical data and results stored within the Resource Management macro-component, and can be used to select input and auxiliary files for the processing, processing parameters, visualize and share the results of the processing and the products available in the platform. The Workspace component, instead, acts as a client of the Execution Environment macrocomponent, permitting the final user to submit Execution Requests, in terms of an App or FU

Workspace

Marketplace

Systematic Processing

Execution Request

Query

Collaboration Bucket Support Tools Ops Page 43/68

Management Console

Geo Resource Browser

Metadata, Data, Results Usage/ Status

Accounting and Monitor

AAI

Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT 3 Figure 2-10 UserRev Access Portal macro-component architecture

Authz

Workflow processing services execution. This components permits the user to manage the processing services activity, starting and stopping processing services, checking the status, interact with the processing service and retrieve its results. Submission of Execution Requests on behalf of the user is performed also by the internal Systematic Processing component. This component manages bulk or near-real-time processing and it is able to submit processing requests each time a bulk processing chunk completes or a trigger for near-real-time processing is executed. The processing results generated by the Execution Requests executed by this component constitute a new products series which can be exposed via the collaboration tools or the user workspace as a new product. The Exploitation Platform Operator main interface is the Management console. This components acts as entry-point to all the management interfaces of the components in the platform. The User Access Portal macro-component provides two shared functionalities. The first is the AAI (Authentication and Authorization Infrastructure) component. This component task is to consistently authenticate the user on all the different platform components and provide a single place for managing information required for authorization (e.g. user group). The second shared functionality is provided by the Account and Monitor component, which collects information on the status of the platform and its usage and makes it available to the Operator via the Management Console. More details on the UAP components are provided in the next paragraphs.

2.6.1

Marketplace

As described in the introduction, the Marketplace component provides an interface to discover processing services, products and other capabilities of the platform for the final user. Scope of this component is to compare the different platform services and help the user in selecting the best service for his needs. The marketplace is therefore a database of service entries, where service entries are usually separated by service classes (Workflow, App, EO Data, Processing Results and Articles). Only some of these services will be processing services (Workflows and Apps), while others will be datasets available for download (input and auxiliary data), processing results or information documents. The details on how to present these services entries are mostly related to the particular deployment of the Exploitation Platform, therefore this architecture will focus only on the general functionalities of the marketplace. The Marketplace component shall comply with the following requirements [ARQ MKP01S] Marketplace shall display a list of service item entries, separated in classes and with configurable names: Workflow processing services (e.g. named Workflow), App processing services (e.g. named App), available input products and auxiliary products collections (e.g. named EO Data), set of processing results (e.g. named Processing Results), documents from the Support Tools (e.g. named Articles). Page 44/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

[ARQ MKP02S] It shall be possible to extend the service list with the definition of new classes for service entries tailored to a particular community. [ARQ MKP03S] It shall be possible to organize the Marketplace service list by thematic categories and service class categories. [ARQ MKP04S] Marketplace list items shall show the following general attributes: description, publisher, associated price, category, access criteria (restricted, free to use, etc…), associated tags, link to the item documentation (on the Support Tools). [ARQ MKP05S] Marketplace list shall provide the user, upon request, with a link to the item reference, pointing to the Resource Management macro-component catalogue entry reference. [ARQ MKP06S] Marketplace list shall redirect the user, upon request, to a link to access the service (e.g. link to the Workspace for the execution of the processing service). [ARQ MKP07S] Marketplace shall connect to the Resource Management macrocomponent, via the Resource Management macro-component API, to get the list of items, link to reference the item and relevant information for all the resources displayed as service item, for the items registered in the Resource Management macro-component. [ARQ MKP08C] Marketplace can integrate the Geo Resource Browser component for interfacing with the Resource Management macro-component. [ARQ MKP09C] Marketplace list can provide possibility to vote the service and to comment the service, according to the Resource Management macro-component API, for the items registered in the Resource Management macro-component. [ARQ MKP10S] Marketplace shall be able to import items published in the Collaboration Bucket. This import shall be manual and performed by the operator (after proper validation of the services). [ARQ MKP11S] Marketplace shall provide the user with capabilities to perform searches on the service list, as free text or filtered by any of entry attributes (e.g. via tags), by the entry type (e.g. Processing Results, EO Data, Workflow or App) or by the entry category. [ARQ MKP12H] Marketplace list should provide possibility to generate a DOI for an item. [ARQ MKP13C] Marketplace service list can show estimation of the total cost for the item, based on an a-priori evaluation of costs for required hardware, software and data used in average by the service item.

2.6.2 Workspace The Workspace component acts as a client of the Execution Environment, managing the execution of a processing services via Execution Requests. This component shall comply with the following requirements: [ARQ WKS01S] Workspace shall offer a web interface to create Execution Requests for running Apps and Workflow, defining required input, processing parameters, ICT Page 45/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

resources associated (e.g. Execution Cluster to run, ICT resource templates for an App) and information on where to publish the results (e.g. which Resource Management macrocomponent, which collection). [ARQ WKS02S] Workspace shall integrate Geo Resource Browser geo-located search functionalities for selection of data and auxiliary products for Workflow and Apps execution. [ARQ WKS03S] Workspace shall show only compatible data and auxiliary products with the App and Workflow to be executed. [ARQ WKS04C] Workspace can provide the possibility to upload custom data (such as in-situ data) for the processing into an internal user data basket to be used as auxiliary input for the processing. This data may be temporarily published and stored into the Resource Management macro-component. [ARQ WKS05S] Workspace shall offer the possibility to chain Workflows, using Workflow results as inputs for other Workflows. [ARQ WKS06S] Workspace shall give possibility to monitor execution of a Workflow, including Workflow status, Workflow step (job) status and percentage of completion. [ARQ WKS07S] Workspace shall give possibility to pause execution of a processing service. [ARQ WKS08S] Workspace shall give possibility to stop execution of a processing service. [ARQ WKS09S] Workspace shall provide, prior to perform the request for stop of a processing services, confirmation to the user, alerting the user of the consequences of such a request (e.g. all non-saved results will be lost). [ARQ WKS10S] Workspace shall give possibility to restart a processing service. [ARQ WKS11S] Workspace shall give possibility to visualize processing service results, via the Geo Resource Browser integrated client. [ARQ WKS12S] Workspace shall give possibility to visualize processing logs for the execution of a processing service. [ARQ WKS13S] Workspace shall report the processing service usage of resources and quotas, via the APIs defined by the Execution Environment macro-component, providing the user with information on the consumed and available resources. [ARQ WKS14S] Workspace shall provide the total cost estimation for a new processing service execution, prior to the processing service submission, based on the output of a job simulation of the Execution Request, performed via the Execution Environment API. [ARQ WKS15S] Workspace shall provide the total cost for the processing service execution, after the processing service submission, based on the output the Execution Request.


[ARQ WKS16S] Workspace shall give users the possibility to publish processing services output results into the Collaboration Bucket to be shared with the community. [ARQ WKS17S] Workspace shall give users the possibility to package a processing service execution into a Virtual Experiment, which contain all the information required to reproduce the processing service execution, following the standards described into the EP Virtual Experiment Packaging [EPVEP] document. [ARQ WKS18S] Workspace shall give users the possibility to publish Virtual Experiments in the Collaboration Bucket. [ARQ WKS19S] Workspace shall provide the user with an interface to access the Apps, obtained by integrating the App access web interface provided by the App Manager component.

2.6.3 Systematic Processing The Systematic Processing component takes case of automatic Workflow submission within the infrastructure, to perform bulk or near-real-time processing. Considering the amount of resources required to run systematic processing, this interface is usually managed by the Operator, but its access may be given also to the final user. The approach taken by this architecture is to consider a general systematic processing as processing instantiated on behalf of the user upon realization of pre-defined data-based or time-based triggers. In this view, bulk processing is performed by configuring time-based triggers associated to past times events, coupled with a maximum quota of concurrent processing executions. This way, the system will split the bulk processing into chunks related to each time-based event time interval. On the other end, near-real-time processing can be performed by data-based events, triggered by the ingestion of new input data. The Systematic Processing component shall meet the following requirements: [ARQ SYP01S] Systematic Processing shall trigger a given Workflow execution according to a set of pre-defined conditions. Supported conditions shall include: data-based triggers or time-based triggers. [ARQ SYP02S] Systematic Processing data-based triggers shall instantiate new processing when a configurable amount of new data is ingested into a given collection of the Resource Management macro-component. [ARQ SYP03S] Systematic Processing time-based triggers shall be repeated with a configurable time interval (e.g. hours, days, months) within a configurable start and end time. Each trigger shall instantiate new processing for all the data within the particular time interval associated to it. [ARQ SYP04S] Systematic Processing shall interface to the Resource Management macro-component to retrieve information for data-based triggers. [ARQ SYP05S] Systematic Processing triggers shall be restricted to limits in the number of concurrent Workflow executions configurable by the user.


[ARQ SYP06S] Systematic Processing shall provide a web interface to setup processing parameters and processing triggers. [ARQ SYP07S] Systematic Processing web interface should integrate the Geo Resource Browser component to define the parameters of the processing. [ARQ SYP08H] Systematic Processing triggers setup web interface should be integrated in the Workspace component. [ARQ SYP09S] Systematic Processing shall be able to automatically publish processing results into the Marketplace or into the Collaboration Bucket, into a given product collection. [ARQ SYP10S] Systematic Processing shall monitor the user quota and alert the user if the current processing quota is not enough to cover the Systematic Processing needs. [ARQ SYP11S] Systematic Processing shall support possibility for the operator to restrict the Systematic Processing capabilities to a given group of users.

2.6.4 Collaboration Bucket The Collaboration Bucket component main task is to stimulate collaboration between the users within the platform, in form of sharing the results of processing services, related publications and other services provided by the platform. The Collaboration Bucket is conceived as a big bucket where users can publish results, services, processing services, reference to scientific articles or everything else they want to share with the scientific community. A Collaboration Bucket is in many ways similar to a Marketplace, but with the difference that the services exposed within a Collaboration Bucket are not controlled or endorsed by the platform operator, may not be registered within the Resource Management macrocomponent, are always open and free services, have no pre-defined OLA and are not strictly categorized by discipline. Like for the marketplace component, the details on how to present the services entries in the collaboration bucket are mostly related to the particular deployment of the Exploitation Platform, therefore this architecture will focus only on the general functionalities of the collaboration bucket. The Systematic Processing component shall meet the following requirements: [ARQ CBK01S] Collaboration Bucket shall display a list of item entries, separated in classes and with configurable names: Workflows (e.g. named Workflow), Apps (e.g. named App), available input products and auxiliary products collections (e.g. named EO Data), set of processing results (e.g. named Processing Results), documents from the Support Tools (e.g. named Articles), reproducible single processing (e.g. named Virtual Experiment). [ARQ CBK02S] It shall be possible to extend the service list with the definition of new classes for service items tailored to a particular community.


[ARQ CBK03S] It shall be possible to organize the Marketplace service list by thematic categories and service class categories. [ARQ CBK04S] Collaboration Bucket list shall show the following general attributes: description, publisher, category, associated tags. [ARQ CBK05S] Collaboration Bucket list should provide the user, upon request, with a link to the item reference, pointing to the Resource Management macro-component catalogue entry reference, if this exists. [ARQ CBK06S] Collaboration Bucket list shall redirect the user, upon request, to a link to access the item (e.g. to the Geo Resource Browser for EO Data and Processing Results, Workspace for Virtual Experiment, Service and App) [ARQ CBK07S] Collaboration Bucket shall connect to the Resource Management macrocomponent, via the Resource Management macro-component API, to get the list of items, link to reference the item and relevant information for all the resources displayed as service item, for the items registered in the Resource Management macro-component. [ARQ CBK08C] Collaboration Bucket can integrate the Geo Resource Browser component for interfacing with the Resource Management macro-component. [ARQ CBK09C] Collaboration Bucket list can provide possibility to vote and comment an entry, according to the Resource Management macro-component API, for the items registered in the Resource Management macro-component. [ARQ CBK10S] Collaboration Bucket shall provide the user with capabilities to perform searches on the service list, as free text or filtered by any of the entry attributes (e.g. via tags), by the entry type (e.g. Processing Results, EO Data, Workflow or App) or by the entry category. [ARQ CBK11S] Collaboration Bucket shall permit publish of processing results via the Workspace component, as a new Result entries. [ARQ CBK12S] Collaboration Bucket shall permit publish of Virtual Experiments via the Workspace component, as new Virtual Experiment entries. [ARQ CBK13S] Collaboration Bucket shall permit publish of integrated Workflow and App via the Packaging Tool of the Application Integration macro-component, as new processing services entries. [ARQ CBK14S] Collaboration Bucket shall permit publish of Geo Resource Browser queries defining a set of products via the Geo Resource Browser component, as new Processing Results entry. [ARQ CBK15S] Collaboration Bucket shall permit to organize listed entries into collections.

2.6.5 Geo Resource Browser The Geo Resource Browser is a component which is generally integrated inside the user facing interfaces of the UAP, but can be also integrated by other components if required. Page 49/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

The rationale behind this choice is to have a common client to the Resource Management API which can be integrated into multiple User Access Portal components. The scope of this component is therefore to provide a graphical user interface to access the resource registered on the platform, by communicating with the Resource Management macro-component. The client allows search for resources, filtering, retrieval of resources metadata, retrieval of links to download the resource (when possible) and display the preview associated to the resource. This component shall meet the following requirements: [ARQ GRB01S] Geo Resource Browser shall provide a web interface to interact with the Resource Management macro-component. [ARQ GRB02S] Geo Resource Browser shall support possibility to download the Resource associated binary data, if user is authorized to do so by the Resource Access Gateway component of the Resource Management macro-component. [ARQ GRB03S] Geo Resource Browser shall support visualization metadata associated to the resource, via the API of the Resource Management macro-component. [ARQ GRB04S] Geo Resource Browser items shall provide link to subscribe to search queries, to be informed when new data following the query is registered, according to the EP Resource Catalogue Interface [EPRCS] document, query interface section. [ARQ GRB05S] Geo Resource Browser shall support visualization of geo-located resource preview, displaying the preview over a world map. [ARQ GRB06S] Geo Resource Browser shall support visualization of a legend and colour table for all the visualized previews. [ARQ GRB07S] Geo Resource Browser shall support visualization of not-geo-located resource previews, such as static images (graphs, bars, etc…). [ARQ GRB08S] Geo Resource Browser shall support visualization of statistics for resource spatial and temporal coverage, obtained via the Resource Management macrocomponent APIs. [ARQ GRB09S] Geo Resource Browser shall support the possibility to package a query into resources sets, which contain all the information required to retrieve and download the list of resources and details on the resources themselves (e.g. description, documentation, etc…), following the standards described into the EP Virtual Experiment [EPVEP] document. [ARQ GRB10S] Geo Resource Browser shall give users the possibility to publish a query as a Processing Results set in the Collaboration Bucket. [ARQ GRB11C] Geo Resource Browser can give Data Providers the possibility to upload, publish and manage resources into the platform (e.g. auxiliary data for the processing), acting as interface to the resource ingestion component of the Resource Management macro-component API.


[ARQ GRB12H] Geo Resource Browser should provide the data quota status for resource download, per user, as provided by the Resource Access Gateway component. [ARQ GRB13H] Geo Resource Browser should provide the data quota status for resource upload, per user, as provided by the Resource Ingestion component. [ARQ GRB14C] Geo Resource Browser can integrate GIS capabilities to perform postanalysis of processing result resources.

2.6.6 Support Tools The Support Tools are a set of tools to ease utilization by the user of the platform services. These tools are usually constituted of an issue tracking system, which takes care of the requests for bugs or new features by the user, an interactive Q&A, where user can submit questions about the platform and get answers from the operator or the community, a CMS (e.g. a Wiki), which includes community built documentation on how to use the platform services and a set of basic tutorials for the platform usage and collaborative development tools (e.g. GitHub) for collaborative development and integration of processing services.. In general, the Support Tools shall meet the following requirements: [ARQ STL01S] Support Tools shall provide an Issue Tracking System, to take care of the requests for bugs or new features by the user. [ARQ STL02S] Support Tools shall provide an Interactive Q&A system, for the final user to submit questions which can be answered by other users or the Operator [ARQ STL03S] Support Tools shall provide a Content Management System, in the form of a Wiki and/or a Blog, where the Operator or the PI can include information about the processing services usage, linkable by the other UAP components. [ARQ STL04S] Support Tools shall provide a collaborative development support tool, such as, for example, an online collaborative software repository (e.g. GitHub). [ARQ STL05S] Wiki shall include information on how to use the basic system functionalities, such as integrate a new processing service, start, stop or manage a new processing service execution, search for input data or published results. [ARQ STL06H] Blog should be the placeholder for including posts about generic usage of the platform, information related to platform evolution (e.g. what-is-new pages after a new platform release) or major achievements performed via the platform utilization (e.g. research projects, articles or other activities which benefitted from the platform services)

2.6.7

Management Console

The Management Console is the entry point for the Operator user for managing the platform. The implementation of this component may vary a lot in the different exploitation platforms, according to the level of integration of the Exploitation Platform components. In most of the implementations, this component will be just a CMS with links to the management consoles of the single components and information on how to configure them. Page 51/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

Even if recommendable from the Operator point of view, a fully integrated management console for the entire Exploitation Platform is often not feasible, due the fact that this architecture does not enforce a common standard for the management consoles access of the different components. Anyway, a Management console should at least provide the features described by the following requirements: [ARQ MGC01S] Management Console shall include information on how to configure and deploy every component of the platform. [ARQ MGC02S] Management Console shall link the platform component management interface, if these are present. [ARQ MGC03C] Management Console can integrate the platform component management interface, if these are present. [ARQ MGC04H] Management Console access should be restricted only to AAI users with high level of assurance. For example, to users logging from trusted IPs, trusted network security zones, local credentials (private, dedicated, non-SSO credentials), etc…

2.6.8 Account and Monitor The Account and Monitor component task is to aggregate accounted usage of the platform by the user and to monitor the availability of the platform components. This service acts as a collector for Usage Records and Status Information from the other components, logging respectively the usage and the status of the infrastructure services. The components then generates, based on the collected information, a set of custom status and usage reports. In particular, the component accounts and monitor processing, resource utilization, resources sharing activities and consumed ICT resources, interfacing with all the platform components. The Account and Monitor component shall comply with the following requirements: [ARQ ACM01S] Account component shall support usage records gathering modules, to gather usage records for the different components implementations. [ARQ ACM02S] Account component usage records gathering modules shall support gathering of usage records from the Execution Gateway component, for accounting processing services activities and ICT resource usage. [ARQ ACM03S] Account component usage records gathering modules shall support gathering of usage records from the Catalogue component, for accounting resource queries. [ARQ ACM04S] Account component usage records gathering modules shall support gathering of usage records from the Resource Access Gateway component, to account resource access for processing and download. [ARQ ACM05S] Account component usage records gathering modules shall support gathering of usage records from the Resource Ingestion component, to account resource ingestion amount. Page 52/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

[ARQ ACM06S] Account component usage records gathering modules shall support gathering of usage records from the Collaboration Bucket, to account amount of data, services and products shared. [ARQ ACM07S] Account shall report on the usage of the platform services in terms, at least, of services and resources collections, users, user groups and general platform usage. [ARQ ACM08S] Account shall provide per-user information on the usage of the platform ICT resources in terms of billable items of the ICT resource usage. For example, for a Cloud environment, the Account component should be able to provide, per user, the amount of CPU, RAM, Disk, etc… consumed per processing on the single components of the platform. [ARQ ACM09S] Account shall provide per-user information on the usage of the platform processing services. For example, the Account component should be able to provide, per user, the amount of times an given processing service have been started, the total time they have been used and the list of processing services started by the user. [ARQ ACM10S] Account shall provide per-user information on the usage of the platform resources, separated by resource type (e.g. input data, results, auxiliary data) and resource collection. For example, the Account component should be able to provide, per user, the amount of times binary from a given collection is downloaded and the total size of the binary data downloaded. [ARQ ACM11S] It shall be possible for the Data Provider to retrieve accounting information on the data he owns. [ARQ ACM12S] Monitor component shall report on the status of the processing services submission, resource availability and user-facing interfaces availability monitoring the status of all the involved components. [ARQ ACM13S] Monitor component reported status indicators shall provide indicators of the load of the component, load of the ICT resources where the component is deployed, availability of the component provided external interfaces, correct response of the provided external interfaces. [ARQ ACM14S] Monitor component shall have a modular approach, supporting custom status gathering modules for multiple components, multiple components provided external interfaces and different status indicators (e.g. OK, Warning, Critical).

2.6.9 AAI The Authorization and Authentication Infrastructure component aims to provide to the platform services with a shared authentication and authorization framework. The scope of this component is to have a consistent authorization and profile of the users within the platform. It shall be noted that this component do not aim to replace the authorization logics of the other components of the infrastructure, but to provide them the required information to perform proper authentication.


Several implementations exist to provide an AAI infrastructure. Anyway, to be usable within the Exploitation Platform, the AAI shall comply with the following requirements: [ARQ AAI01S] AAI shall support multiple Identity Providers (IDPs), which shall be the only entities entitled to manage the identities of the users and to store the user permanent access credentials. [ARQ AAI02S] AAI shall support multiple IDP technologies (i.e. SAML2, OpenID, etc…) via different access modules. [ARQ AAI03S] No permanent access credentials shall be exchanged directly between the components and the user, access credentials shall be exchanged only between the user and the designated Identity Provider. [ARQ AAI04S] AAI shall provide Single-Sign-On capabilities, meaning that a user who is logged on one platform component should be automatically logged on another platform component without the need to re-insert the authentication credentials [ARQ AAI05S] AAI shall provide basic user profile information, such as user UUID and verified emails. [ARQ AAI06H] AAI should provide additional user optional profile information, such as user Name, Surname, affiliated Institution, IDP name, Country [ARQ AAI07S] AAI shall provide additional user profile information for Authorization, such as user Group, data access permissions, etc… [ARQ AAI08C] AAI can provide additional custom optional profile information, such as contact address and contact phone number. [ARQ AAI09S] AAI shall be able to link, in a manual (on request of the user) or semiautomated way (upon confirmation of the user), different user identities coming from multiple IDPs, into one single UUID, based on common profile information (e.g. same email, same Name and Surname). [ARQ AAI10S] AAI shall support standard web-browser interface authentication via session cookies. [ARQ AAI11S] AAI shall support generic API authentication, without the need to include permanent credentials within the API client, by using pre-generated access tokens [ARQ AAI12S] AAI shall support multiple token translation technologies (i.e. Keystone 2.0, X509 Proxy Certificates, etc…), to support different API authentication methods for each component. [ARQ AAI13S] AAI shall give the user the possibility to revoke generated API access tokens. [ARQ AAI14S] AAI shall support user delegation, e.g. allowing a given trusted component to authenticate to another component on behalf of a given user, without the need of the user to insert his own credentials.


[ARQ AAI15S] AAI user delegation shall be provided under direct approval of the user, if one of the source or end components in the delegation trust chain is external to the Platform. [ARQ AAI16S] AAI shall give the user the possibility to revoke delegation.


3

END-TO-END SCENARIOS

This chapter will detail common Exploitation Platform usage scenarios in relation with the Architecture defined in the previous chapter, enlightening the components and interfaces involved in the scenarios. Scope of this chapter is to make clearer the contribution of each component to the overall platform usage, in order to drive the components implementation. To do so, the usage scenarios depicted in this chapter may contain samples related to particular thematic utilization of the platform.

3.1

EO Data Exploitation

The EO Data Exploitation scenario represents the Final User usage of the platform services, from the discovery/selection of data resources, processing services execution, visualization of the results, execution of analysis tools. In this scenario, final users may have very different levels of knowledge about the theme behind the particular Exploitation Platform, the EO data included in it or even the general concept of EO satellite data processing. Users can range from specialist on the Exploitation Platform theme and on EO satellite data analysis to non-scientific users coming from other domain (journalists, enthusiastic, general users). Therefore, it is very important for the Exploitation Platform to provide an initial entry point capable to guide the user towards the data, products, processing services more suitable for their needs and expertise. This task is performed by the Marketplace component, which acts as showcase for the platform capabilities. In order to increase the Platform outreach capabilities to users of different heterogeneous domains, it is very important for the Marketplace component to offer the information in a simple way, with a clear description, even for non-scientific users of the platform services. From the Marketplace, the user is then redirected to the particular platform service he is interested into. For each one of this service, we will have different sub-scenarios and different platform components involved. As example, a user interested in “Sea Temperature” will enter the Marketplace, search for “Sea Temperature” in the portfolio of the platform services, and be presented with the results displayed in Figure 3-1. At this point, he can identify a service which he is interested into, for example he may decide to access one of the already generated products representing Sea Temperature over a given AOI or decide to run his own processing via a custom call to the Sea Surface Temperature Workflow or his own analysis running the ENVI App. In the next paragraphs, we will detail the sub-scenarios for each one of these platform services. For the sake of simplicity, in the following descriptions, we will omit two components which are always underlying the platform services, which are the AAI component (providing authorization for the user in each service) and the Account and


Monitor component (providing records of the platform usage and overall status of the platform) As you can see at the end of Figure 3-1, the Marketplace should provide a link for the user to access the Collaboration Bucket, to have a wider list of available services. The collaboration bucket, in fact, contains Processing Results, Workflow and Apps produced by the community which have not been validated. The case of the user reaching a service from the Collaboration Bucket instead of the Marketplace, anyway, does not provide any difference to the sub-scenarios described below.

Search for: Sea Temperature Processing Results: Daily Sea Temperature over Mediterranean Average Sea Temperature over Daytime and Nighttime on the Mediterranean basin Daily Sea Temperature over US West Coast Average Sea Temperature over Daytime and Nighttime on the United States West Coast

EO Data: ENVISAT AATSR Level 2 SST (operational) Sea Surface Temperature operational product ENVISAT AATSR Level 1P (operational) AATSR Top of the Atmosphere calibrated radiances. Tags: [Sea] [Temperature] [SST] […]

and other 13 entries (Show)…

Workflows: Sea Surface Temperature average estimation Estimates average, maximum and minimum sea temperature from a wide set of Satellite data sources over a given time interval. Tags: […] BEAM Binning Calculate generic BEAM Arithmetic Expressions, and perform Level 2 products binning to produce Level 3 products over a custom AOI and time interval. Tags: [Sea] [Temperature] [SST] […]

Apps: ENVI ENVI combines advanced image processing and proven geospatial analysis technology to help you extract meaningful information and make better decisions. Tags: [Sea] [Temperature] [SST] [Data visualization] […]

and other 6 entries (Show)…

Articles: [Blog] A study of CCI on the marine ecosystem […] sea surface temperature of the Mediterranean basis is rising over […]. [Q&A] Which is the best sensor to calculate Sea Temperature from satellite? The ENVISAT AATSR sensor has been designed with the specific purpose to calculate […] [Wiki] How to use the BEAM Binning service In order to calculate the sea surface temperature, BEAM arithmetic equation shall be set to […] and other 20 entries (Show)…

Not found what you are searching for? Have a look at our Community Services in the Collaboration Bucket.

Figure 3-1 EO Data Exploitation usage scenario Marketplace discovery interface results mock-up


(1) Request for a given processing result


(2) Query

Catalogue

(3) Metadata (including Preview and Resource Location) (4) Download from Resource Location

Data Access Gateway

Internal/External repository

Figure 3-2 Product usage scenario flow

3.1.1

Processing Results

The involved components in the EO Data Exploitation, Processing Results usage scenario is described in Figure 3-2. As you can see from the diagram, in this scenario, (1) the user is redirected by the Marketplace to the Geo Resource Browser component, with a pre-defined search for the given set of Processing Results (e.g. “Daily Sea Temperature over Mediterranean”). (2) The Geo Resource Browser will forward the search Query for the results to the Catalogue component, which (3) will return the data resources, their Preview and, if the user is entitled to download the binary data, the location of the binary data itself. At this point, entitled users can (4) download the binary data from this Resource Location, via the Resource Access Gateway component, which will then connect to the underlying repository to retrieve the binary data. Alternatively, the user can request for to subscribe to the Geo Resource Browser query (e.g. via an RSS feed). The subscription will provide him with the latest version of the products and their metadata. The standard product request which the user is redirected to from the Marketplace may be also be modified by the user itself on the data browser. For example, for the “Daily Sea Temperature over Mediterranean”, the user will be probably redirected to the latest product produced (e.g. the day before), but he will be then able to select another date and obtain the archived products. This will depend on the particular configuration of the Product collections and will involve always the Geo Resource Browser and the Catalogue. The products belonging to the given collection can be static results generated or uploaded by the PI, but, most of the times, they will be products generated by the platform itself via on-demand or Systematic Processing. This scenario is detailed in the New EO Product Development usage scenario described in paragraph 3.3.

3.1.2

EO Data


Request EO data

for


Query

Catalogue

Metadata (including Preview), Statistics

Workflow usage scenario

App usage scenario

Resource Ingestion

Figure 3-3 EO Data usage scenario flow

As you can see from Figure 3-3, if the user selects an EO Data (input data), the usage workflow will be slightly different from the Product one. This because EO data is usually not downloaded but used as input for processing services. For the EO Data, the Geo Resource Browser will provide links to the Workflows and Apps which can use that EO data as input. If the users clicks one of these links, he will be redirected to the Workflows or App usage scenarios described in the next paragraphs. Since the EO data is used as input for a custom processing, it is very important for the user to understand the coverage of the data over his Area Of Interest, that he can select the proper input products. That is why, in this case, the Geo Resource Browser integrates and displays also the information from the Statistics Generator sub-component of the Catalogue component, in form of coverage maps for the input data. In this scenario, EO data, is usually ingested into the platform by external repositories, configured by the Operator, by the Resource Ingestion component of the Resource Management macro-component.

3.1.3

Workflow

After a click on a Workflow in the Marketplace, user is redirected to the Workspace, where he can submit the processing for the related processing service. The flow for this usage scenario is depicted in Figure 3-4. The Workspace integrates the Geo Resource Browser component to (1) query and (2) select the input EO data for the processing using the metadata and statistics information. After this, (3) the data reference is returned to the Workspace and the user is asked to enter the processing parameters, which depends on the particular service, and then submits the processing.



(4) Execution Request

Execution Gateway

(1) Query (2) Metadata (including Preview), Statistics

(3) Input EO data reference Request for a given service

(13) Results (including Preview, Res. location)

Workspace

(6) Query

(7) Metadata (including Resource Location)

(12) Processing Logs, Results reference

(5)


Catalogue

Workflow Manager

(11)

Resource ingestion

Data Access Gateway

(10) Results (9) Input and auxiliary data, Workflow package (8)

Execution Cluster

Figure 3-4 Workflow usage scenario flow

Upon submission, the Workspace (4) sends an Execution Request to the Execution Environment macro-component, which is handled by the Execution Gateway. Since the processing service is a Workflow, the Execution gateway will (5) forward the request to the Workflow Manager, including the input EO data reference obtained, the processing parameters and the selected Execution Cluster to run the processing. The Workflow Manager (6) will query the Catalogue to retrieve eventual additional auxiliary data required for the processing and (7) resolve metadata and binary data location. The input and auxiliary data location (8) is sent to the Execution Cluster, together with the details of the Workflow Software to be executed and its parameters. The Execution Cluster (9) will then use the data resource location to retrieve the binary data from the Internal or External repositories, by interfacing with the Resource Access Gateway component. In case the Workflow is not deployed in the Execution Cluster, the Workflow Manager will also repeat the steps (6 to 9) above to resolve the Workflow binary package resource location and forward a preparation Job to the Execution Cluster to deploy the Workflow software application (always retrieved via the Resource Access Gateway). After the end of the processing, the Workflow Manager will gather the output and push the results (10) to the Resource Ingestion component, to (11) include them into the Catalogue and an Internal Repository or an user-defined External Repository, such as the user Google Drive or user FTP server. (12) The final processing logs, including the resource catalogue reference are then pushed to the Workspace, which will show the results. The results visualization is performed by the integrated Geo Resource Browser, which (13) retrieves from the catalogue, using the results reference, the preview and the binary resource data location. Page 60/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

The overall time between the workflow submission and the publication of the results may be in the order of days or weeks. Thus, the Workspace monitoring of the job status is an activity usually performed in background. The user will then periodically connect to the Workplace to check the job status. The final processing logs can be reported to the user by the Workspace, to debug the processing service execution. At the end of the processing, the user is presented with the possibility to share the output into the Collaboration Bucket, by adding a title, a description and a collection where he wants to publish (e.g. sea surface temperature processing results). The result will then become a product entry in the Collaboration Bucket, and accessed by the final user according to the Product usage scenario described in 3.1.1. Also, from the Workspace, the user can create a Virtual Experiment, which will reproduce the completed job, and share it with collaborators or link it into his own articles.

3.1.4

App

The App usage scenario is similar to the Workflow one, with the main difference that, in case of an App, there is no input and processing parameters selection in the Workspace before the execution of the processing service, but the input data, together with the processing operations to be executed will be interactively selected by the user via the App interface. The App usage workflow is depicted in Figure 3-4. The App is instantiated always via the Workspace, where the user requests for the Start of the App. (1) The start request is sent to the Execution Gateway of the Execution Environment, which analyses the request, select the best Execution Cluster and (2) forwards it to the App Manager. (3) The App Manager will then run the App software package into the Execution Cluster. If the App software package is not already deployed into the execution cluster, the App Manager will resolve Geo Resource Browser

(11) Results (including Preview, Res. location)

(5) Query Request for a given service (1) Execution Request

Execution Gateway

Workspace

(4) App access (6) Metadata (including Resource Location)

(10) Processing Logs, Results reference

(2)


Catalogue (9)

Resource ingestion (8) Results

(3) App package (7) Input and auxiliary data,

App Manager (3)

Figure 3-5 App usage scenario flow Page 61/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

Data Access Gateway

Execution Cluster

the App binary package resource location and execute a preparation Job to the Execution Cluster to deploy the App software application, retrieved via the Resource Access Gateway. At this stage, the App software is running on the Execution Cluster and it is ready to interact with the user to perform processing operations. The App Manager (4) forwards the App access web interface reference to the Workspace, which provides it to the user. During the processing, the user can use the App manager abstraction functionalities to (5) execute Queries to look for input data, auxiliary data and other resources, (6) access the data Metadata, including the data location and, ultimately, (7) retrieve input and auxiliary data from the data repositories via the Resource Access Gateway of the Resource Management macro-component. During the processing itself or at the end of the processing, the user can, via the App access web interface, request the (8) publish of output products (Results) into the Resource Management macro-component. This will be performed via a call to the Data Ingestion component, which will then (9) store the Results metadata into the catalogue and the binary data into an Internal Repository or an user-defined External Repository, such as the user Google Drive or user FTP server. In some App Manager implementations, the publish of the Results is not done interactively, but the User will need to place the Results into a particular output folder of the virtual App file system and these will be automatically pushed to the Data Ingestion component on App closure. At the end of the processing, the user requests for the App to be stopped in the Workspace. This request is forwarded to the App Manager which removes the App from the execution environment, (10) sends the final processing logs and published results references into the catalogue and frees the App resources. Since, after App closure all the user results are removed from the Execution Cluster, it is very important for the user to publish all the important data. All the results generated by the App, which reference has been forwarded to the Workspace, will be visible via the integrated Geo Resource Browser component, which (11) gathers the Results preview and binary data location from the Catalogue. The App processing results are no different than the Workflow processing results, thus, like in the Workflow usage scenario, the user is presented with the possibility to share the results into the Collaboration Bucket, by adding a title, a description and a collection where he wants to publish (e.g. sea surface temperature products). The result will then become a product entry in the Collaboration Bucket, and accessed by the final user according to the Product usage scenario described in 3.1.1. Also, from the Workspace, the user can create a Virtual Experiment, which will reproduce the completed job, and share it with collaborators or link it into his own articles. During the overall App processing execution, the App Manager component will monitor the status of the App. Moreover, the App Manager, to minimize the costs of running the App, can pause the App when it is not actively used by the user (e.g. when the user logs out from the App web interface or the App generates no load on the reserved computing ICT resources. In this case, the user, after reconnecting with the Workspace, will find the App in pause and will need to restart it to resume its work.


3.1.5

Article

By clicking on the Blog, Wiki or Q&A Article entries in the Marketplace, the user is redirected to the Support Tools article of interest. These articles may include links to Processing Results or Virtual Experiments. Upon click to a Processing Result, the User will be redirected to the Geo Resource Browser, as per the Product usage scenario described in 3.1.1. Upon click to a Virtual Experiment, instead, the user is redirected to the Workspace, with the details of the Virtual Experiment processing service pre-configured in the Workspace. Therefore, the user can visualize the details of the Virtual Experiment, modify them and eventually resubmit the processing service. The execution of a processing service starting from a Virtual Experiment follows the same usage scenario as described in the Workflow and App usage scenarios.

3.2

New EO Service Development

The main actor of the New EO Service Development usage scenario is the Principal Investigator (PI). This user has a deep knowledge on Satellite Data Processing in his particular scientific area of expertise and has an a software application, developed by himself or other collaborators, which he wishes to integrate into the platform to perform on-demand processing, systematic processing, data analysis tasks or just share the application with the scientific community. This usage scenario covers the overall work of the PI to integrate the application and all its dependencies into the platform, validate and deploy it. A representation of the components involved in the New EO Service Development usage scenario is depicted in Figure 3-6. (1) Documentation

Support Tools

(3)

Test/Debug Tool (4)

DevBox

PI

(2) (7) Auxiliary data

(6) (Test) Execution Request

Packaging Tool

(8)

(Test) Execution Environment

(9)

Collaboration Bucket

Import Tool (10) (5) test/final Workflow App

Resource Management Figure 3-6 New EO Service Development usage scenario flow


Marketplace

The first step for the PI is the instantiation of the integration environment used to integrate his software application within the platform. This environment is provided by the DevBox component. The DevBox instantiation can be done within the platform itself, using platform provided resources, or locally, using the PI own resources (e.g. a laptop). In the first case, the DevBox can be considered an App (since the processing of integration is an integrative processing), and the DevBox instantiation will be via the normal App usage scenario described in paragraph 3.1.4. In the second case, the DevBox can be a virtual machine or a standalone desktop application, which the user can download and run locally. The information on how to instantiate a DevBox is (1) provided via the Support Tools (e.g. be in form of a Wiki page), together with the documentation on how to integrate a software application via the tools provided by the DevBox. A user may start the integration from a blank project or some samples. Also, he can start from an existing Workflow or App, and modify it to his needs. In this second case, (2) the PI uses the Import Tool component, to import an App or a Workflow from the Resource Management macro-component into the DevBox. During the integration process, the PI will need to test the processing by running a test version of the processing service. This feature is provided (3) by calling the Test/Debug Tool. This component will (4) interface with the Packaging Tool (5) to produce a test version of the Workflow or App package and (5) upload it to the Resource Management macro-component. After the test application is uploaded to the Resource Management, the Test/Debug tool (6) will send a test Execution Request to the test Execution Environment. The Test/Debug tool acts here as a simulated Workspace component, according to the same usage scenarios defined in 3.1.3 or 3.1.4 for running a Workflow or an App. The difference between the Workspace component and the Test/Debug tool is that the Test/Debug tool will start from a pre-defined test case specified by the PI and will always provide detailed debug information of the overall application execution. The test Execution Environment to be used is usually decided by the operator, and is usually a different deployment of the operational Execution Environment (see Security considerations in paragraph 3.2.2). Until now, we considered that all the auxiliary and input data needed by the application are already present into the platform, so that the (Test) Execution Environment can resolve them as per a normal processing service execution. This is sometimes not the case, since auxiliary or particular input data may be required by the software application. Therefore, in this scenario, the PI acts also as Data Provider, (7) uploading the additional required data to the Resource Management macro-component. This will be handled as normal resource ingestion, via the Resource Ingestion component APIs. The DevBox can provide a tool to ease this upload, or redirect the user to the data resource upload interface of the Geo Resource Browser. After the test is successful, the PI can (8) use the Packaging Tool component to build the final App or Workflow package, upload it to the Resource Management macro-component


as final version of the processing service and, eventually, (9) publish it into the Collaboration Bucket to be used by himself or the community. From this point, the normal Workflow or App usage scenario applies, as described in paragraphs 3.1.3 and 3.1.4. These can be used by the PI itself, his collaborators or other users to validate the application. After the validation is complete, the Operator can (10) decide to include the application into the official platform services advertised in the Marketplace.

3.2.1

Processing Service updates

The publication of an App or a Workflow, is not the end of the PI integration work. In fact, the PI can be still perform the New EO Service Deployment scenario, to integrate an updated version of the same processing service. In this case, the PI can or cannot have already available the Service Integration environment DevBox instance from the previous integration. In case the PI has not anymore the original DevBox environment, he can always recreate it by instantiating a new DevBox and use the Import Tool to import the latest stable or test version of the processing service. After the update to the processing service software executable, the PI will create a new stable version of the processing service. If the processing service is already stored within the Collaboration Bucket or the Marketplace, the new version will not be automatically presented to the user. The PI will need to explicitly release the new version via the Collaboration Bucket and the Operator will need again, after validation, to release the new version in the Marketplace. The PI can also have the right to keep the old version online together with the new one.

3.2.2 Security considerations For the sake of simplicity, the New EO Service Development usage scenario described in the paragraphs above neglects several security considerations. The requirements of full flexibility of integration of any PI application into the platform are implemented in this architecture without limitation or control on the kind of code the PI may run. The code underlying the PI application is entirely in control of the PI itself, and he may even not upload it in the Platform, but integrate the software application starting from its binary. Since the PI software will then run on the Platform premises, the integration of a new EO Service is a very sensitive process in terms of security. In the described usage scenario, the PI does not need a particular authorization to start a DevBox, this because the DevBox can be packaged and executed directly on the PI premises, where no control on access policies can be forced by the platform. It is therefore highly important that the DevBox has no hardcoded authentication or authorization information. The PI will then need authorization to access the Execution Environment for testing purposes. This step will require acceptance of Terms and Conditions, releasing responsibilities for malicious code execution from the platform provider. The PI will need Page 65/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

also to be authorized to perform data upload into the Resource Management component, operation which also requires acceptance of dedicated Terms and Conditions, releasing responsibility for illegal data uploaded to the platform. Both these authorizations are performed manually by the Operator on the PI user account, and stored on the AAI component (in case general authorization is implemented) or in the particular components (Resource Ingestion for data/Workflow/App upload, and Execution Gateway for test job execution). At Execution Gateway level, authorization implies configuration of the source Workflow/App Catalogue reference associated to the new processing service, with the possibility of running the service over test Execution Clusters. This permits untested processing services to run on a separate environment and do not disturb the normal operational processing services. At the end of the integration, when the final version is published, the Operator will need again to manually configure the processing service in the Execution Gateway to access the operational Execution Cluster. This step is also required to ensure control on which software application runs on the platform operational environment. The same manual configuration is performed by the Operator in the Resource Ingestion component for the PI to be able to upload additional auxiliary or input data. A given collection will be created by the Operator for the PI and quota assigned to the collection. When the PI invokes the Test/Debug Tool, the Package Tool is called to create a package for a test version of the application and upload it to the Execution Environment. For this operation, the PI authenticates directly to the Resource Ingestion component. By default, the test version package is private and access permission are granted only to the PI itself (which is the package owner). To share the work with other users, giving them the possibility to early test the software or edit it, the PI may publish the software in the collaboration bucket even when it is in alpha stage, can directly assign, via the packaging tools, download rights to his collaborators (which can then create a new DevBox environment and import the package) or can clone or grant access to his own DevBox instance to his collaborators. After the production of the test package, the Test/Debug Tool requests the execution of the App and Workflow on the test Execution Environment. The PI will need to authenticate again for this request on the Execution Gateway, using his own credentials. At this point, the Execution Gateway will authenticate on behalf of the user to the Resource Access Gateway (to download the test software package and gather the data for the processing) and to the Workflow/App manager (to perform processing service submission and package deploy). This authentication is performed with the direct aid of the AAI component, which will generate a token for the Execution Gateway to authenticate in the other components on behalf of the user. During the integration of a software, the PI may desire to use a Workflow or an App developed by another user, and published into the Collaboration Bucket or the Marketplace. For this, in the scenario above, the PI invokes the Import Tool. The import Page 66/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

may be performed in two way, linking the Workflow/App or downloading it and unpackaging it (for editing). For the first, PI (and further users of the integrated Workflow/App) would need to have authorization to run the processing service, for the second, PI will need authorization for download. The final step of the integration is to publish of a final stable version of the application into the Collaboration Bucket. Processing services packaged for stable versions are, by default, publicly available for download and processing. Anyway, via the Packaging Tool, the PI may require the processing service package to be private, allowing download and/or processing only to selected users.

3.3

New EO Product Development

In the New EO Product Development usage scenario, a PI creates a new processing results collection into the platform. There are several ways this can be performed, with different involved components, which are summarised in Figure 3-7. To produce the results and store them within the Platform, the PI has three possible approaches, illustrated in the top of the figure. (1) The first and most used is to configure a bulk or near-real time processing via the Systematic Processing component. This component will then send Execution Requests to the Execution Environment, in a similar way than the Workflow usage scenario described in paragraph 3.1.3. Via the Systematic Processing, the PI can only run Workflow processing service, this because App processing services requires the direct interaction of the PI which cannot be emulated by the Systematic Processing component. (2) Another possible method is to perform manual processing via the Workspace. This is the only way to generate processing results by using App processing services and is useful when there is no way to easily define a systematic processing. This kind of processing is also equal to the one considered in the paragraphs 3.1.3 and 3.1.4. (3) Last, the user may upload directly the processing results. This because, for example,

(1)

(2)

Systematic Processing

Execution Request


Resource Ingestion

Workspace (3) Direct Processing Results Upload

PI (3)

Collaboration Bucket


Figure 3-7 New EO Product Deployment usage scenario Page 67/68 Exploitation Platform Open Architecture Date 22/04/2016 Issue DRAFT Rev 3

(4)

Marketplace

they were generated by the PI outside the platform. In this view, the PI acts as a Data Provider. Considering the size of a product collection, as per the New EO Service Development scenario, the PI will need to be explicitly authorized by the Operator to upload Processing Results into the platform. This step is performed manually by the Operator within the Resource Ingestion macro-component, where a given collection will be created by the Operator for the PI and quota assigned to the collection. Once the products are into the platform, (3) the PI connects to the collaboration bucket to specify, via the Geo Resource Browser, a query that encompasses all the produced processing results. This query is then published in the Collaboration Bucket. From this point, the Processing Results collection can undergo external validation, to assess the product quality. After successful validation, the Operator can (4) include the product into the official platform services advertised in the Marketplace.