• Marek Vavrovic

Merge Join Transformation in SSIS

Updated: Sep 29

The Merge Join Transformation in SSIS is used to perform SQL Joins such as Inner Join, Left Outer Join, Full Outer Join and Right Outer Join (indirectly achieved by Swapping the tables) in SQL Server Integration Services. SSIS Merge Join Transformation is very useful to load data into the Dimension tables in Data Warehouse.


The Merge Join Transformation in SSIS will only work with Sorted data. So, Sort Transformation is mandatory before applying any joins using Merge Join Transformation. This transformation works similarly as the ON clause in the SQL Server.


Merge Join Transformation in SSIS Example


I want to get UnitPrice, Quantity, Discount and DiscountedPrice by RegionDescription.

Using Northwind database.


In this article, we are going to perform Inner Join on five SQL tables using Merge Join Transformation in SSIS. Before we start creating the SSIS Package, let us look at our 5 source tables on which we are going to perform Inner Join using ssis Merge Join Transformation.



I have skipped table Employees because EmployeeTerritories contains column EmployeeID which can be join with EmployeeID from Orders


Select r.RegionDescription As Region,

UnitPrice,

Quantity,

Discount,

UnitPrice*(1-Discount) as DiscountedPrice

from [Order Details] od

inner join Orders o

on od.OrderID = o.OrderID

inner join EmployeeTerritories et

on o.EmployeeID = et.EmployeeID

inner join Territories t

on et.TerritoryID =t.TerritoryID

Inner join Region r

on t.RegionID = r.RegionID


Step 1:

All tables coming form SQL Database. I will be using OLE DB Source for them.


Add Data Flow Task, double click.

Add two Ole Db Sources for Order Details and Orders table.

Data Access Mode : Table or view. Extracting all columns.


Step 2:

Add Sort transformation.

Inputs must be sorted otherwise Merge Join Transformation returns an error.

When you double click on the Sort, Available Input Columns window shows up.


Name: specify the column to sort [OrderID on the left and right side] These column must be the same. It is like ON clause in the SQL Join

ON [Order Details].OrderID = Orders.OrderID

In the next step I will need EmployeeID that's why I retrieved OrderID and EmployeeID from Orders table on the right side.

Pass Through: specify the columns you want to retrieve for the Merge Join Transformation.

Step 3:

Add Merge Join Transformation

  1. this input is coming from the left side table, specify Merge Join Left Input. There will be 1 input left for Orders table. This table will take automatically the second input: Merge Join Right input

  2. [connect it]

Step 4:

Join type: Inner Join

From the Sort Input I take all the columns, from Sort 1 input just EmployeeID. I will need EmployeeID for Merge Join with EmployeeTerrirories table.

Step 5:

5.1 Add Ole DB Source, EmployeeTerritories table

5.2 Add Sort 2 trans. Column to sort: EmployeeID


In the next Merge join I will build this part of the Join.

Select

UnitPrice,

Quantity,

Discount,

UnitPrice*(1-Discount) as DiscountedPrice

from [Order Details] od

inner join Orders o

on od.OrderID = o.OrderID

inner join EmployeeTerritories et

on o.EmployeeID = et.EmployeeID


Step 6:

Add Sort 3 to Merge Join tran.

Specify the sort column: EmployeID

I must have the same sort columns on both side otherwise the next Merge Join 2 will return an error. [on o.EmployeeID = et.EmployeeID]


Step 7:

Add Merge join 2

From the table on the left side I retrieve all columns besides EmployeeID. I wont need this column anymore and from EmployeeTerritories table I need just TerritoryID to join Territories, which is the next table to join. [on et.TerritoryID =t.TerritoryID]


Select r.RegionDescription As Region,

UnitPrice,

Quantity,

Discount,

UnitPrice*(1-Discount) as DiscountedPrice

from [Order Details] od

inner join Orders o

on od.OrderID = o.OrderID

inner join EmployeeTerritories et

on o.EmployeeID = et.EmployeeID

inner join Territories t

on et.TerritoryID =t.TerritoryID


Step 8:

Add Territories table. Connect it to Sort transformation [Sort TerritoryID] sorted by TerritoryID

Add Sort_TerritoryID transformation to Merge Join 1. Specify the column to sort [TerritoryID]

Add Merge Join 2 . Retrieve all columns from the table on the left side. From the right-side table I need just RegionID column

Step 9:

Add the last table Region.

Add Sort 4, specify RegionID to sort the data set. The other RegionID column comes from Merge Join 2


Step 10:

Add Sort 5 to specify the column to sort for Merge Join 2, which is basically ON clause from SQL server:

Inner join Region r

on [Merge Join 2].RegionID = r.RegionID

Step 11:

11.1 Add Merge Join 3 , Join type: Inner Join

11.2 On Sort 5 and Sort 4, specify which columns should be retrieved from this operation, by checking the boxes.

Step 12

Add Derived Column transformation to create DiscountedPrice column.

[UnitPrice]*(1- [Discount])

Step 13

Add OLE DB Destination


Step 14

Run the package


Step 15

RegionOrders table, which is the table I got by this Merge join transformation returned the same result as SQL Join code, so everything is OK


9 views0 comments

Recent Posts

See All