SSIS Best Practices - Microsoft

Bob Duffy. Irish SQL Academy 2008. Level 300 ... 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC. Quantity: 4. Make: Unisys. Model: ES3220L. OS:.
4MB Sizes 22 Downloads 432 Views
Irish SQL Academy 2008. Level 300

Bob Duffy

DTS 2000

SSIS 2005

1.75 Developers

*Figures are only approximations and should not be referenced or quoted

Optimize and Stabilize the basics

Measure Tune Parallelize

• Minimize staging (else use RawFiles if possible) • Hardware Infrastructure: Disks, RAM, CPU, Network • SQL Infrastructure: File Groups, Indexing, Partitioning

• Replace destinations with RowCount • Source->RowCount throughput • Source->Destination throughput • OVAL performance tuning strategy • The Three S‟s • Data Flow Bag of Tricks

• Lookup patterns • Script vs custom transform


• Increase the efficiency of every aspect


• Parallelize, partition, pipeline


• Buy faster, bigger, better hardware

But be aware of limitations

Row Based (synchronous)

Partially Blocking (asynchronous)

Blocking (asynchronous)

Source data Source servers EMC CX600 run SSIS 2 Gb Fiber Channel

Destination server runs SQL Server

Database EMC CX3-80

1 Gb Ethernet connections

4 Gb Fiber Channel Source servers: Unisys ES3220L 2 sockets each with 4 core Intel 2 GHz CPUs 4 GB RAM Windows Server 2008 SQL Server 2008

Destination server: Unisys ES7000/One 32 sockets each with dual core Intel 3.4 GHz CPUs 256 GB RAM Windows Server 2008 SQL Server 2008

Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

Unisys ES7000/one Enterprise Server Microsoft Windows Server 2008 x64 Datacenter Edition 32 socket dual core Intel® Xeon 3.4 GHz (7140M) 256 GB 8 dual port 4Gbit FC Intel® PRO/1000 MT Server Adapter Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4) EMC Clariion CX3-80 (Qty 1) 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity: Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

4 Unisys ES3220L Windows2008 x64 Enterprise Edition 2 socket quad core Intel® Xeon processors @ 2.0GHz 4 GB 1 dual port 4Gbit Emulex FC Intel PRO1000/PT dual port Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4) 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)





Orders Table Partition 1


Partition 2


Partition 3


Partition 4


Partition 5


Partition 6













Partition 56



SSIS orders.tbl.2



SSIS orders.tbl.1

Partition 55


(Package details removed to protect the innocent)

Follow Microsoft Development Guidelines

• Iterative design, development & testing

Understand the Business

• People & Processes • Kimball‟s ETL and SSIS books are an excellent reference

Get the big picture

Platform considerations

• Resource contention, processing windows, … • SSIS does not forgive bad database design • Old principles still apply – e.g. load with/without indexes? • Will this run on IA64 / X64? • No BIDS on IA64 – how will I debug? • Is OLE-DB driver XXX available on IA64? • Memory and resource usage on different platforms

Process Modularity

• Break complex ETL into logically distinct packages (vs monolithic design) • Improves development & debug experience

Package Modularity

• Separate sub-processes within package into separate Containers • More elegant, easier to develop • Simple to disable whole Containers when debugging

Component Modularity

• Use Script Task/Transform for one-off problems • Build custom components for maximum re-use

Concise naming conventions Conformed “blueprint” design patterns Presentable layout

Annotations Error Logging Configurations

Get as close to the data as possible • Limit number of columns • Filter number of rows

Don‟t be afraid to leverage TSQL • Type conversions, null coercing, coalescing, data type sharpening • select nullif(name, „‟) from contacts order by 1 • select convert(tinyint, code) from sales

Performance Testing & Tuning • Connect Output to RowCount transform • See Performance Best Practices

„FastParse‟ for text files



select select * from etl.uf_FactStoreSales(@Date) dbo.Tbl_Dim_Store.SK_Store_ID , Tbl_Dim_Store.Store_Num ,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di v_ID, 0) as SK_Merch_Div_ID from dbo.Tbl_Dim_Store left outer join dbo.Tbl_Dim_Merchant_Division on dbo.Tbl_Dim_Store.Merch_Div_Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_N um where Current_Row = 1

Use the power of TSQL to clean the data 'on the fly'

Avoid overdesign Maximize Parallelism Minimize blocking Minimize ancillary data

• Too many moving parts is inelegant and likely slow • But don‟t be afraid to experiment – there are many ways to solve a problem • Allocate enough threads • EngineThreads property on DataFlow Task • See Performance Talk

• Synchronous vs. Asynchronous components • Memcopy is expensive

• For example, minimize data retrieved by LookupTx

Three Modes of Operation

Tradeoff memory vs. performance Can use Merge Join component instead

• Full Cache – for small lookup datasets • No Cache – for volatile lookup datasets • Partial Cache – for large lookup datasets

• Full Cache is optimal, but uses the most memory, also takes time to load • Partial Cache can be expensive since it populates on the fly using singleton SELECTs • No Cache uses no memory, but takes longer

• Catch is that it requires Sorted inputs • See SSIS Performance white paper for more details

Custom components


• Can written in any .Net language • Must be signed, registered and installed – but can be widely re-used • Quite fiddly for single task

• Can be written in VisualBasic.Net or C# • Are persisted within a package – and have limited reuse • Have template methods already created for you