Fannie Mae

paths. ▷ Data storage infrastructure was siloed on a usecase basis limiting performance and ... Results of Deploying Parallel File Systems. Solutions. ▷ Fannie ...
11MB Sizes 1 Downloads 258 Views
1!

© 2015 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

ddn.com

Accelerating Data Intensive Financial Analytics over 450-500% Accelerating SAS, Informatica, R, Ab Initio, Matlab and in house codes

John Eubanks Systems Engineer V, Fannie Mae

2!

Summary ▶ 

Engineered comprehensive solution

▶ 

Exploited IB-RDMA to deliver higher throughput

▶ 

Eliminated TCP encapsulation overhead

▶ 

higher throughput per core

▶ 

Fannie Mae accelerated key elements of its data intensive risk analysis in issuance of mortgage backed securities over 450%

▶ 

3!

Minimized TCO and Maximized ROI

Fannie Mae ▶ 

The leading source of residential mortgage credit in the U.S. secondary market •  establish and implement industry standards‚ •  develop better tools to price and manage credit risk‚ •  build new infrastructure to ensure a liquid and efficient market‚ and •  facilitate the collection and reporting of data for accurate financial reporting and improved risk management.

▶ 

4!

Fannie Mae is supporting today's economic recovery and helping to build a sustainable housing finance system

Fannie Mae Risk Analytics Modeling ▶ 

Risk Analytics Modeling Framework based on several data intensive applications •  Combination of SAS, Informatica, R, Ab Initio, Matlab and in house codes •  Primarily executed in a UNIX environment

▶ 

Daily, weekly, and monthly ETL processes using the modeling framework •  Also used for reporting and modeling process

▶ 

Common workflows involve extraction and manipulation of billions of records of loan level data •  Construct panel data sets and build competing risk models using various analytical methods •  Robust loan risk analysis system

5!

Risk Analytics and IO Bottleneck Challenges ▶ 

Time Sensitive Workloads •  Prepare optimized algorithms for risk analytics •  Limited window to deliver accurate risk predictions •  Deliver higher performance for mixed I/O SAS, Informatica, R, Ab Initio, Matlab and in house workloads

▶ 

Data Intensive Risk Analytics Challenges •  Extremely compute and IO intensive •  Number of users were scaling up along with the data •  IO bottlenecks in prior infrastructure crippling performance for current and emerging workflow growth trends

6!

Broader Impact to Stakeholders

7!

▶ 

We were able to run only a select risk modeling scenarios across select paths

▶ 

Data storage infrastructure was siloed on a usecase basis limiting performance and increasing costs

▶ 

Turn around time for mission critical Data Intensive workloads such as SAS, R, Informatica, Matlab and home grown applications, was long

▶ 

Lower throughput forced us to deploy more compute resources to accommodate longer runtimes resulting in increased OPEX

▶ 

Limited our capability to address broad customer stakeholders within the company

▶ 

Due to the growing demand of HPC powered Risk Modeling, the Data Center foot print had grown considerably

▶ 

Total cost of ownership including licensing costs, services and support costs had grown resulting in very high Price-Performance premiums

Urgent Need for High Performance And Throughput Solutions ▶ 

Needed infrastructure that: •  Delivered higher performance for Data Intensive Workflows (Such as SAS, Informatica, R, Ab Initio, Matlab and in house codes) •  Eliminated IO Bottlenecks and Data Silos •  Mini