I would like to introduce Type 6 EBS Modified SCD and get your feedback. I couldn’t find any documentation explaining this approach so I guess this is a first, please prove me wrong.
So it’s finally happened Bangalore is now officially Bengaluru. Naturally this means a lot of the changes on the road and Highways of India. Changes to Google maps and other cartography software and most importantly the postal system.
A minor impact is going to be the database systems which store addresses. So I figured now would be a good time to revisit slowly changing dimensions. Most people are familiar with the Slowly Changing Dimension task available in SSIS DFT. But people don’t realize there are more than 3 SCD types. So here are the types once more
Type 0
No Change at all, Ignore any change to the state columns
Id | City | State |
1 | Bangalore | KA |
This is probably not what most people will do since they would want to reflect the new name on their site.
Type 1
Overwrite existing record
Before
Id | City | State |
1 | Bangalore | KA |
After
Id | City | State |
1 | Bengaluru | KA |
This solution seems to be the best or easiest approach. Historical representation of the data doesn’t add value in most cases.
Type 2
Store Historical data by adding new row.
Before
Id | City | State |
1 | Bangalore | KA |
After
Id | City | State | EndDate |
1 | Bangalore | KA | 31 Oct 2014 |
2 | Bengaluru | KA | NULL |
Type 3
Add new column for change data
Before
Id | City | State |
1 | Bangalore | KA |
After
Id | City | State | New City Name |
1 | Bangalore | KA | Bengaluru |
Most likely not going to be used since it doesn’t provide any effective date for the new change and doesn’t not support. Also complicates the application logic.
Type 4
Store historical data in history table
Now(Current table)
Id | City | State |
1 | Bengaluru | KA |
Now (History)
Id | City | State | Create Date |
1 | Bangalore | KA | 1 November 2014 |
A popular approach that is used frequently to capture change but it makes the historical data virtually inaccessible for real time queries without complicated application logic.
Type 6
A Combination of Type (1 +2 +3). It’s a slight complicated model. I can’t see any real world scenario where this is actually implemented I think it’s more of an intellectual exercise.
Before
Id | City | State |
1 | Bangalore | KA |
After
Id | City | State | New City Name | Effective Date | Current |
1 | Bangalore | KA | Bangalore | 31 Oct 2014 | N |
2 | Bengaluru | KA | Bengaluru | NULL | Y |
Type 6 EBS Modified
This is a modified version of the type 6 Dimension that I came up with while writing this blog. I haven’t got the details ironed out but would love to hear any thoughts on this approach.
Before
Id | City | State |
1 | Bangalore | KA |
After
Id | City | State | Parent | Effective Date |
1 | Bangalore | KA | 1 | 31 Oct 2014 |
2 | Bengaluru | KA | 1 | 1 Dec 2050 |
3 | Bengaluru | Union territory | 2 | NULL |
The difference between this one and the Type 6 is the capture of additional hierarchy of the change information. The best way to understand this is to compare the result of the “after” case in both types. In the Type 6 EBS Modified version we get rid of the Current flag and use NULL effective date to identify current rows. Additionally we replace the New City Name with ParentID, why? The Type 6 Dimension will fail if there are 2 cities called Bangalore in different state of India. It loses the ability to correctly identify the parent record of the change. The idea being that by recursively looping this table on id and parent id we get the complete and accurate change log for the City column.
Please Consider Subscribing
