Change Data Capture (CDC) in Microsoft SQL Server
Change Data Capture is a design pattern to create a delta/change data set which can be used in different integration scenarios including an Extract Transform Load (ETL) process from a transactional database used by the application to an analytics database for analytics on the latest transactional data.
Here are my notes from a video from Microsoft Learn that discusses CDC in the context of SQL Server and Azure SQL.
Why use CDC?
- Track data changes for audit purposes and perform analytics on change data
- Propagate changes to downstream subscribers
- Execute ETL operations to move data changes in the OLTP system to data lake or data warehouse using Azure data factory (ADF)
- Event based programming that provides instantaneous responses based on data changes. For example providing dynamic pricing based on changes in inventory.
This allows a push based approach which allows to react instantaneously to changes in the data.
Notes on enabling CDC in databases
- For Microsoft SQL databases, CDC needs to be activated first at database level by executing a stored procedure
- Once enabled it will create few additional tables in a separate database schema as system tables which can have impact on the storage that is used for the database. These tables include storing the captured columns, what are the change tables enabled for CDC, DDL history of the change tables, log sequence number time mapping, etc.
- Also, need to enable CDC at the table level by running another stored procedure
- For SQL Server need to ensure SQL Server agent is running
- Factors that may impact performance: number of CDC enabled tables, frequency of changes in the tracked tables, space available in source database, etc.
In summary, CDC should be considered when planning for efficient ETL for near real time Business intelligence keeping some of the performance trade-offs that might come when enabling it in the database in mind.