• Marek Vavrovic

Data Warehouse Dimension Loading Using Lookup and Conditional Split Transformations.

I will be doing incremental dimension loading amoung 3 databases. All 3 of them are on the same server.

I have a transactional DB Salome containing Sales.Customer table with data about customers.

stg_Salome DB contains 2 tables CustomerUpdates, CustomerInserts.

SalomeDW contains one table: DimCustomer. This is the table I want to use for the incremental data loading.


Data overview.



Table definitions


use Salome;

go


Create table Sales.Customer

(

customer_id int primary key identity(1,1),

customer_acct_num nvarchar(15),

first_name nvarchar(50),

last_name nvarchar(50),

customer_address nvarchar(50),

customer_city nvarchar(50),

customer_country nvarchar(50),

birthdate date,

education nvarchar(50),

acct_open_date date,

member_card nvarchar(50),

occupation nvarchar(50),

homeowner char(1),

acct_close_date date

)


Tables in the Staging Database. (CustomerInserts, CustomerUpdates)

Use stg_Salome;

go


Create table CustomerUpdates

(

customerkey int primary key identity(100,1),

customer_id int,

customer_acct_num nvarchar(15),

first_name nvarchar(50),

last_name nvarchar(50),

customer_address nvarchar(50),

customer_city nvarchar(50),

customer_country nvarchar(50),

birthdate date,

education nvarchar(50),

acct_open_date date,

member_card nvarchar(50),

occupation nvarchar(50),

homeowner char(1),

acct_close_date date)


Create table CustomerInserts

(

customer_id int ,

customer_acct_num nvarchar(15),

first_name nvarchar(50),

last_name nvarchar(50),

customer_address nvarchar(50),

customer_city nvarchar(50),

customer_country nvarchar(50),

birthdate date,

education nvarchar(50),

acct_open_date date,

member_card nvarchar(50),

occupation nvarchar(50),

homeowner char(1),

acct_close_date date

)


Use SalomeDW;

go

Create table DimCustomer

(

customerkey int primary key identity(100,1),

customer_id int,

customer_acct_num nvarchar(15),

first_name nvarchar(50),

last_name nvarchar(50),

customer_address nvarchar(50),

customer_city nvarchar(50),

customer_country nvarchar(50),

birthdate date,

education nvarchar(50),

acct_open_date date,

member_card nvarchar(50),

occupation nvarchar(50),

homeowner char(1),

acct_close_date date

)


Columns used


SCD 1 = Changing attribute (customer_address, customer_city, customer_country, occupation)

SCD 2: Historical attribute (customer_acct_num, member_card)

SCD 0 = fixed

CustomerUpdates will catch the SCD1 updates

CustomerInserts will catch the SCD2 updates

Everything else will go directly into DimCustomer table.


SSIS

Execute SQL Task: First thing I am doing is truncating the staging tables.

Data Flow Task:

Data Flow 1: I am going the set up this solution


OLE DB Source: data from the transactional database system.


Lookup transformation:

no matching entries = Ignore failure.

Connection:


I am using Lookup transformation to check if the data already exists in the dimension table.

Using WHERE [acct_close_date] is null to select the active records only. I do not have any expired data in the transactional table. Also do not need the CustemerKey from the dimension table to performe this lookup transformation.

Columns:


Connect customer_id from transactional DB to customer_id in DW. Append "_OUT" to be able to recognize which columns comming from transactional table and which columns are from data warehouse table.

Confirme with OK.

Conditional Split:


1. Inserts (New records): If a new record is added on the source, that record is null on the destination: ISNULL(customer_id_OUT) . I use this output to insert data directly into DimCustomer.


2. SCD Type 1 = Changing attribute (customer_address,customer_city, customer_country, occupation)

if source do not match destination in these values we use the Condition split output to insert data into CustomerUpdates table (staging table)

customer_address != customer_address_OUT || customer_city != customer_city_OUT || customer_country != customer_country_OUT || occupation != occupation_OUT


3. SCD 2: Historical attribute (customer_acct_num, member_card)

I have defined two attributes as SCD2. If data from source table do not match the destination data this output will be used to update the destination table. Using CustomerInserts table.

customer_acct_num != customer_acct_num_OUT || member_card != member_card_OUT


4. SCD Type 0: Fixed Attribute. I am not using any code, ignoring the changes on the source table.

Inserts Output:

Newly inserted data without any modifications (updates) go directly into data warehouse table DimCustomer.

ChangingAttribute Output:

SCD 1 = Changing attribute (customer_address,customer_city, customer_country, occupation)

Data will be inserted into stg_Salome staging DB, CustomerUpdates table. Later on I'll be using Execute SQL Task to update the DimCustomer table using data from this table.

HistoricalAttribute Output:

SCD 2: Historical attribute (customer_acct_num, member_card)

I will use CustomerInserts table to insert a record into acct_close_date. To mark the record on dimensional table as expired.


We are back on the Data Flow. I am going to add an Execute SQL Task. This transformation will perform 2 update statements to update the dimension table.


  1. if any SCD type 1 update happened

SCD Type 1 = Changing attribute (customer_address, customer_city, customer_country, occupation)


I'll be using this update statement: (update between 2 databases)


UPDATE DW

SET

DW.customer_address = SCD1.customer_address,

DW.customer_city =SCD1.customer_city,

DW.customer_country = SCD1.customer_country,

DW.occupation = SCD1.occupation

FROM SalomeDW.dbo.DimCustomer DW

INNER JOIN stg_Salome.dbo.CustomerUpdates SCD1

ON DW.customer_id = SCD1.customer_id;

GO


2. if any SCD type 2 update happened

SCD 2: Historical attribute (customer_acct_num, member_card)


I'll update the end-date of the acct_close_date column in DimCustomer table.


UPDATE DW

SET

DW.acct_close_date= CAST(GETDATE() AS date)

FROM SalomeDW.dbo.DimCustomer DW

INNER JOIN stg_Salome.dbo.CustomerInserts SCD2

ON DW.customer_id = scd2.customer_id

WHERE DW.acct_close_date is null



Data Flow Task 1:

I did not insert the changes from SCD 2: Historical attribute (customer_acct_num, member_card) into dimension table yet. I am going to do it in this step.

OLE DB Source:

using CustomerInserts table.

OLE DB Destination:

Using DimCustomer dimension table as destination. This is the last transformation inside this package. The packge is ready to use for the incremental dimension data loading.



Testing the solution


  1. Initial data loading

Data warehouse table is empty


Running SSIS package for the initial data loading. Data have been loaded from the source into destination. No updates performed, using output just for the new data.


Result



2. Running SCD Type 1 & Type 2 updates


---SCD 1 = Changing attribute (customer_address, customer_city, customer_country, occupation)


UPDATE Sales.Customer

SET

customer_address = 'TEST1',

customer_city = 'TEST1',

customer_country ='TEST1',

occupation ='TEST1'

WHERE customer_id = 1


--SCD 2: Historical attribute (customer_acct_num, member_card)


UPDATE Sales.Customer

SET

customer_acct_num = '000000000',

member_card = 'Yellow'

WHERE customer_id = 1


The rest of the columns are SCD Type 0, fixed attributes. Any update on the source table will be ingnored.



12 views0 comments

Recent Posts

See All