SSIS Best Practices - Microsoft

1.75 Developers. SSIS 2005 .... See Performance Best Practices. „FastParse‟ for text ... ...
4MB Sizes 771 Downloads 3966 Views
Irish SQL Academy 2008. Level 300

Bob Duffy

DTS 2000

SSIS 2005

1.75 Developers

*Figures are only approximations and should not be referenced or quoted

Optimize and Stabilize the basics

Measure Tune Parallelize

• Minimize staging (else use RawFiles if possible) • Hardware Infrastructure: Disks, RAM, CPU, Network • SQL Infrastructure: File Groups, Indexing, Partitioning

• Replace destinations with RowCount • Source->RowCount throughput • Source->Destination throughput • OVAL performance tuning strategy • The Three S‟s • Data Flow Bag of Tricks

• Lookup patterns • Script vs custom transform


• Increase the efficiency of every aspect


• Parallelize, partition, pipeline


• Buy faster, bigger, better hardware

But be aware of limitations

Row Based (synchronous)

Partially Blocking (asynchronous)

Blocking (asynchronous)

Source data Source servers EMC CX600 run SSIS 2 Gb Fiber Channel

Destination server runs SQL Server

Database EMC CX3-80

1 Gb Ethernet connections

4 Gb Fiber Channel Source servers: Unisys ES3220L 2 sockets each with 4 core Intel 2 GHz CPUs 4 GB RAM Windows Server 2008 SQL Server 2008

Destination server: Unisys ES7000/One 32 sockets each with dual core Intel 3.4 GHz CPUs 256 GB RAM Windows Server 2008 SQL Server 2008

Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

Unisys ES7000/one Enterprise Server Microsoft Windows Server 2008 x64 Datacenter Edition 32 socket dual core Intel® Xeon 3.4 GHz (7140M) 256 GB 8 dual port 4Gbit FC Intel® PRO/1000 MT Server Adapter Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4) EMC Clariion CX3-80 (Qty 1) 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity: Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:

4 Unisys ES3220L Windows2008 x64 Enterprise Edition 2 socket quad core Intel® Xeon processors @ 2.0GHz 4 GB 1 dual port 4Gbit Emulex FC Intel PRO1000/PT dual port Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4) 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)





Orders Table Partition 1


Partition 2


Partition 3


Partition 4


Partition 5


Partition 6













Partition 56



SSIS orders.tbl.2



SSIS orders.tbl.1

Partition 55


(Package details removed to protect the innocent)

Follow Microsoft Development Guidelines

• Iterative design, development & testing

Understand the Business

• People & Processes • Kimball‟s ETL and SSIS books are an excellent reference

Get the big picture

Platform considerations

• Resource contention, processing windows, … • SSIS does not forgive bad database design • Old principles still apply – e.g. load with/without indexes? • Will this run on IA64 / X64? • No BIDS on IA64 – how will I debug? • Is OLE-DB driver XXX available on IA64? • Memory and resource usage on different platforms

Process Modularity

• Break complex ETL into logically distinct packages (vs monolithic design) • Improves development & debug experience

Package Modularity

• Separate sub-processes within package into separate Containers • More elegant, easier to develop • Simple to disable whole Containers when debugging

Component Modularity

• Use Script Task/Transform for one-off problems • Build custom components for maximum re-use

Concise naming conventions Conformed “blueprint” design patterns Presentable layout

Annotations Error Logging Configurations

Get as close to the data as possible • Limit number of columns • Filter number of rows

Don‟t be afraid to leverage TSQL • Type conversions, null coercing, coalescing, data type sharpening • select nullif(name, „‟) from contacts order by 1 • select convert(tinyint, code) from sales

Performance Testing & Tuning • Connect Output to RowCount transform • See Performance Best Practices

„FastParse‟ for text files



select select * from etl.uf_FactStoreSales(@Date) dbo.Tbl_Dim_Store.SK_Store_ID , Tbl_Dim_Store.Store_Num ,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di v_ID, 0) as SK_Merch_Div_ID from dbo.Tbl_Dim_Store left outer join dbo.Tbl_Dim_Merchant_Division on dbo.Tbl_Dim_Store.Merch_Div_Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_N um where Current_Row = 1

Use the power of TSQL to clean the data 'on the fly'

Avoid overdesign Maximize Parallelism Minimize blocking Minimize ancillary data

• Too many moving parts is inelegant and likely slow • But don‟t be afraid to experiment – there are many ways to solve a problem • Allocate enough threads • EngineThreads property on DataFlow Task • See Performance Talk

• Synchronous vs. Asynchronous components • Memcopy is expensive

• For example, minimize data retrieved by LookupTx

Three Modes of Operation

Tradeoff memory vs. performance Can use Merge Join component instead

• Full Cache – for small lookup datasets • No Cache – for volatile lookup datasets • Partial Cache – for large lookup datasets

• Full Cache is optimal, but uses the most memory, also takes time to load • Partial Cache can be expensive since it populates on the fly using singleton SELECTs • No Cache uses no memory, but takes longer

• Catch is that it requires Sorted inputs • See SSIS Performance white paper for more details

Custom components


• Can written in any .Net language • Must be signed, registered and installed – but can be widely re-used • Quite fiddly for single task

• Can be written in VisualBasic.Net or C# • Are persisted within a package – and have limited reuse • Have template methods already created for you