SSIS Best Practices - Microsoft [PDF]

44 downloads 1055 Views 4MB Size Report
Bob Duffy. Irish SQL Academy 2008. Level 300 ... 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC. Quantity: 4. Make: Unisys. Model: ES3220L. OS:.
Irish SQL Academy 2008. Level 300

Bob Duffy

DTS 2000

SSIS 2005

1.75 Developers

*Figures are only approximations and should not be referenced or quoted

Optimize and Stabilize the basics

Measure Tune Parallelize

• Minimize staging (else use RawFiles if possible) • Hardware Infrastructure: Disks, RAM, CPU, Network • SQL Infrastructure: File Groups, Indexing, Partitioning

• Replace destinations with RowCount • Source->RowCount throughput • Source->Destination throughput • OVAL performance tuning strategy • The Three S‟s • Data Flow Bag of Tricks

• Lookup patterns • Script vs custom transform

Sharpen

• Increase the efficiency of every aspect

Share

• Parallelize, partition, pipeline

Spend

• Buy faster, bigger, better hardware

But be aware of limitations

Row Based (synchronous)

Partially Blocking (asynchronous)

Blocking (asynchronous)

http://msdn.microsoft.com/en-us/library/ms345346.aspx

Source data Source servers EMC CX600 run SSIS 2 Gb Fiber Channel

Destination server runs SQL Server

Database EMC CX3-80

1 Gb Ethernet connections

4 Gb Fiber Channel Source servers: Unisys ES3220L 2 sockets each with 4 core Intel 2 GHz CPUs 4 GB RAM Windows Server 2008 SQL Server 2008

Destination server: Unisys ES7000/One 32 sockets each with dual core Intel 3.4 GHz CPUs 256 GB RAM Windows Server 2008 SQL Server 2008

Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

Unisys ES7000/one Enterprise Server Microsoft Windows Server 2008 x64 Datacenter Edition 32 socket dual core Intel® Xeon 3.4 GHz (7140M) 256 GB 8 dual port 4Gbit FC Intel® PRO/1000 MT Server Adapter Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4) EMC Clariion CX3-80 (Qty 1) 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity: Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

4 Unisys ES3220L Windows2008 x64 Enterprise Edition 2 socket quad core Intel® Xeon processors @ 2.0GHz 4 GB 1 dual port 4Gbit Emulex FC Intel PRO1000/PT dual port Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4) 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)

C1

C1

C1

C1

Orders Table Partition 1

Orders_1

Partition 2

Orders_2

Partition 3

Orders_3

Partition 4

Orders_4

Partition 5

Orders_5

Partition 6

Orders_6

SSIS

SSIS

SSIS

SSIS

orders.tbl.3

orders.tbl.4

orders.tbl.5

orders.tbl.6

...

Orders_55

orders.tbl.55

Partition 56

Orders_56

SSIS

SSIS orders.tbl.2

...

SSIS

SSIS orders.tbl.1

Partition 55

orders.tbl.56

(Package details removed to protect the innocent)

Follow Microsoft Development Guidelines

• Iterative design, development & testing

Understand the Business

• People & Processes • Kimball‟s ETL and SSIS books are an excellent reference

Get the big picture

Platform considerations

• Resource contention, processing windows, … • SSIS does not forgive bad database design • Old principles still apply – e.g. load with/without indexes? • Will this run on IA64 / X64? • No BIDS on IA64 – how will I debug? • Is OLE-DB driver XXX available on IA64? • Memory and resource usage on different platforms

Process Modularity

• Break complex ETL into logically distinct packages (vs monolithic design) • Improves development & debug experience

Package Modularity

• Separate sub-processes within package into separate Containers • More elegant, easier to develop • Simple to disable whole Containers when debugging

Component Modularity

• Use Script Task/Transform for one-off problems • Build custom components for maximum re-use

Concise naming conventions Conformed “blueprint” design patterns Presentable layout

Annotations Error Logging Configurations

Get as close to the data as possible • Limit number of columns • Filter number of rows

Don‟t be afraid to leverage TSQL • Type conversions, null coercing, coalescing, data type sharpening • select nullif(name, „‟) from contacts order by 1 • select convert(tinyint, code) from sales

Performance Testing & Tuning • Connect Output to RowCount transform • See Performance Best Practices

„FastParse‟ for text files

BEFORE:

AFTER:

select select * from etl.uf_FactStoreSales(@Date) dbo.Tbl_Dim_Store.SK_Store_ID , Tbl_Dim_Store.Store_Num ,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di v_ID, 0) as SK_Merch_Div_ID from dbo.Tbl_Dim_Store left outer join dbo.Tbl_Dim_Merchant_Division on dbo.Tbl_Dim_Store.Merch_Div_Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_N um where Current_Row = 1

Use the power of TSQL to clean the data 'on the fly'

Avoid overdesign Maximize Parallelism Minimize blocking Minimize ancillary data

• Too many moving parts is inelegant and likely slow • But don‟t be afraid to experiment – there are many ways to solve a problem • Allocate enough threads • EngineThreads property on DataFlow Task • See Performance Talk

• Synchronous vs. Asynchronous components • Memcopy is expensive

• For example, minimize data retrieved by LookupTx

Three Modes of Operation

Tradeoff memory vs. performance Can use Merge Join component instead

• Full Cache – for small lookup datasets • No Cache – for volatile lookup datasets • Partial Cache – for large lookup datasets

• Full Cache is optimal, but uses the most memory, also takes time to load • Partial Cache can be expensive since it populates on the fly using singleton SELECTs • No Cache uses no memory, but takes longer

• Catch is that it requires Sorted inputs • See SSIS Performance white paper for more details

Custom components

Scripts

• Can written in any .Net language • Must be signed, registered and installed – but can be widely re-used • Quite fiddly for single task

• Can be written in VisualBasic.Net or C# • Are persisted within a package – and have limited reuse • Have template methods already created for you

http://sqlcat.com

http://technet.microsoft.com/en-us/library/bb961995.aspx

http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx