I will be doing incremental dimension loading amoung 3 databases. All 3 of them are on the same server.
I have a transactional DB Salome containing Sales.Customer table with data about customers.
stg_Salome DB contains 2 tables CustomerUpdates, CustomerInserts.
SalomeDW contains one table: DimCustomer. This is the table I want to use for the incremental data loading.
Data overview.
Table definitions
use Salome;
go
Create table Sales.Customer
(
customer_id int primary key identity(1,1),
customer_acct_num nvarchar(15),
first_name nvarchar(50),
last_name nvarchar(50),
customer_address nvarchar(50),
customer_city nvarchar(50),
customer_country nvarchar(50),
birthdate date,
education nvarchar(50),
acct_open_date date,
member_card nvarchar(50),
occupation nvarchar(50),
homeowner char(1),
acct_close_date date
)
Tables in the Staging Database. (CustomerInserts, CustomerUpdates)
Use stg_Salome;
go
Create table CustomerUpdates
(
customerkey int primary key identity(100,1),
customer_id int,
customer_acct_num nvarchar(15),
first_name nvarchar(50),
last_name nvarchar(50),
customer_address nvarchar(50),
customer_city nvarchar(50),
customer_country nvarchar(50),
birthdate date,
education nvarchar(50),
acct_open_date date,
member_card nvarchar(50),
occupation nvarchar(50),
homeowner char(1),
acct_close_date date)
Create table CustomerInserts
(
customer_id int ,
customer_acct_num nvarchar(15),
first_name nvarchar(50),
last_name nvarchar(50),
customer_address nvarchar(50),
customer_city nvarchar(50),
customer_country nvarchar(50),
birthdate date,
education nvarchar(50),
acct_open_date date,
member_card nvarchar(50),
occupation nvarchar(50),
homeowner char(1),
acct_close_date date
)
Use SalomeDW;
go
Create table DimCustomer
(
customerkey int primary key identity(100,1),
customer_id int,
customer_acct_num nvarchar(15),
first_name nvarchar(50),
last_name nvarchar(50),
customer_address nvarchar(50),
customer_city nvarchar(50),
customer_country nvarchar(50),
birthdate date,
education nvarchar(50),
acct_open_date date,
member_card nvarchar(50),
occupation nvarchar(50),
homeowner char(1),
acct_close_date date
)
Columns used
SCD 1 = Changing attribute (customer_address, customer_city, customer_country, occupation)
SCD 2: Historical attribute (customer_acct_num, member_card)
SCD 0 = fixed
CustomerUpdates will catch the SCD1 updates
CustomerInserts will catch the SCD2 updates
Everything else will go directly into DimCustomer table.
SSIS
Execute SQL Task: First thing I am doing is truncating the staging tables.
Data Flow Task:
Data Flow 1: I am going the set up this solution
OLE DB Source: data from the transactional database system.
Lookup transformation:
no matching entries = Ignore failure.
Connection:
I am using Lookup transformation to check if the data already exists in the dimension table.
Using WHERE [acct_close_date] is null to select the active records only. I do not have any expired data in the transactional table. Also do not need the CustemerKey from the dimension table to performe this lookup transformation.
Columns:
Connect customer_id from transactional DB to customer_id in DW. Append "_OUT" to be able to recognize which columns comming from transactional table and which columns are from data warehouse table.
Confirme with OK.
Conditional Split:
1. Inserts (New records): If a new record is added on the source, that record is null on the destination: ISNULL(customer_id_OUT) . I use this output to insert data directly into DimCustomer.
2. SCD Type 1 = Changing attribute (customer_address,customer_city, customer_country, occupation)
if source do not match destination in these values we use the Condition split output to insert data into CustomerUpdates table (staging table)
customer_address != customer_address_OUT || customer_city != customer_city_OUT || customer_country != customer_country_OUT || occupation != occupation_OUT
3. SCD 2: Historical attribute (customer_acct_num, member_card)
I have defined two attributes as SCD2. If data from source table do not match the destination data this output will be used to update the destination table. Using CustomerInserts table.
customer_acct_num != customer_acct_num_OUT || member_card != member_card_OUT
4. SCD Type 0: Fixed Attribute. I am not using any code, ignoring the changes on the source table.
Inserts Output:
Newly inserted data without any modifications (updates) go directly into data warehouse table DimCustomer.
ChangingAttribute Output:
SCD 1 = Changing attribute (customer_address,customer_city, customer_country, occupation)
Data will be inserted into stg_Salome staging DB, CustomerUpdates table. Later on I'll be using Execute SQL Task to update the DimCustomer table using data from this table.
HistoricalAttribute Output:
SCD 2: Historical attribute (customer_acct_num, member_card)
I will use CustomerInserts table to insert a record into acct_close_date. To mark the record on dimensional table as expired.
We are back on the Data Flow. I am going to add an Execute SQL Task. This transformation will perform 2 update statements to update the dimension table.
if any SCD type 1 update happened
SCD Type 1 = Changing attribute (customer_address, customer_city, customer_country, occupation)
I'll be using this update statement: (update between 2 databases)
UPDATE DW
SET
DW.customer_address = SCD1.customer_address,
DW.customer_city =SCD1.customer_city,
DW.customer_country = SCD1.customer_country,
DW.occupation = SCD1.occupation
FROM SalomeDW.dbo.DimCustomer DW
INNER JOIN stg_Salome.dbo.CustomerUpdates SCD1
ON DW.customer_id = SCD1.customer_id;
GO
2. if any SCD type 2 update happened
SCD 2: Historical attribute (customer_acct_num, member_card)
I'll update the end-date of the acct_close_date column in DimCustomer table.
UPDATE DW
SET
DW.acct_close_date= CAST(GETDATE() AS date)
FROM SalomeDW.dbo.DimCustomer DW
INNER JOIN stg_Salome.dbo.CustomerInserts SCD2
ON DW.customer_id = scd2.customer_id
WHERE DW.acct_close_date is null
Data Flow Task 1:
I did not insert the changes from SCD 2: Historical attribute (customer_acct_num, member_card) into dimension table yet. I am going to do it in this step.
OLE DB Source:
using CustomerInserts table.
OLE DB Destination:
Using DimCustomer dimension table as destination. This is the last transformation inside this package. The packge is ready to use for the incremental dimension data loading.
Testing the solution
Initial data loading
Data warehouse table is empty
Running SSIS package for the initial data loading. Data have been loaded from the source into destination. No updates performed, using output just for the new data.
Result
2. Running SCD Type 1 & Type 2 updates
---SCD 1 = Changing attribute (customer_address, customer_city, customer_country, occupation)
UPDATE Sales.Customer
SET
customer_address = 'TEST1',
customer_city = 'TEST1',
customer_country ='TEST1',
occupation ='TEST1'
WHERE customer_id = 1
--SCD 2: Historical attribute (customer_acct_num, member_card)
UPDATE Sales.Customer
SET
customer_acct_num = '000000000',
member_card = 'Yellow'
WHERE customer_id = 1
The rest of the columns are SCD Type 0, fixed attributes. Any update on the source table will be ingnored.
Comments