Azure Data Factory (ADF)— Continuous integration and delivery (CI/CD)
Continuous integration is the practice of testing each change made to your codebase automatically and as early as possible. Continuous delivery follows the testing that happens during continuous integration and pushes changes to a staging or production system.
In Azure Data Factory, continuous integration and delivery (CI/CD) means moving Data Factory pipelines from one environment (development, test, production) to another. Azure Data Factory utilizes Azure Resource Manager templates to store the configuration of your various ADF entities (pipelines, datasets, data flows, and so on). There are two suggested methods to promote a data factory to another environment:
- Automated deployment using Data Factory’s integration with Azure Pipelines
- Manually upload a Resource Manager template using Data Factory UX integration with Azure Resource Manager.
The focus of this article is the first of the 2 items suggested in the Microsoft docs above to promote a data factory to another environment. This article provides the code repository structure and Azure pipelines dev-ops templates to comply with the latest improvements (as of March 2024) suggested by Microsoft for CI/CD for Azure Data Factory.
Latest recommended CI/CD flow for ADF
1. Each user makes changes in their private branches.
2. Push to master isn’t allowed. Users must create a pull request to make changes.
3. The Azure DevOps pipeline build is triggered every time a new commit is made to master. It validates the resources and generates an ARM template as an artifact if validation succeeds.
4. The DevOps Release pipeline is configured to create a new release and deploy the ARM template each time a new build is available.
NOTE: Only the development factory is associated with a git repository. The test and production factories shouldn’t have a git repository associated with them and should only be updated via an Azure DevOps pipeline or via a Resource Management template.
Organization of Git repo associated with Azure Data Factory to use the below dev-ops templates
The above repo organization complies with the Microsoft recommendation and follows the requirements as mentioned in the official docs here. The YAML templates for the build and deploy Azure dev-ops pipelines will be shown in the next couple of sections.
Azure dev-ops pipeline to validate and export an ARM template into a build artifact (build pipeline YAML)
# Sample YAML file to validate and export an ARM template into a build artifact
# Requires a package.json file located in the target repository
# Inspired from: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-improvements
parameters:
- name: packageJSONFolderPath
type: string
- name: subscriptionId
type: string
- name: resourceGroup
type: string
- name: adfName
type: string
- name: adfRootFolder
type: string
jobs:
- job: Build
timeoutInMinutes: 120
pool:
vmImage: 'ubuntu-latest'
steps:
# Installs Node and the npm packages saved in your package.json file in the build
- task: UseNode@1
inputs:
version: '18.x'
displayName: 'Install Node.js'
- task: Npm@1
inputs:
command: 'install'
workingDir: '$(Build.Repository.LocalPath)/${{ parameters.packageJSONFolderPath }}' #replace with the package.json folder
verbose: true
displayName: 'Install npm package'
# Validates all of the Data Factory resources in the repository. You'll get the same validation errors as when "Validate All" is selected.
# Enter the appropriate subscription and name for the source factory. Either of the "Validate" or "Validate and Generate ARM temmplate" options are required to perform validation. Running both is unnecessary.
- task: Npm@1
inputs:
command: 'custom'
workingDir: '$(Build.Repository.LocalPath)/${{ parameters.packageJSONFolderPath }}' #replace with the package.json folder
customCommand: 'run build validate $(Build.Repository.LocalPath)/${{ parameters.adfRootFolder }} /subscriptions/${{ parameters.subscriptionId }}/resourceGroups/${{ parameters.resourceGroup }}/providers/Microsoft.DataFactory/factories/${{ parameters.adfName }}'
displayName: 'Validate'
# Validate and then generate the ARM template into the destination folder, which is the same as selecting "Publish" from the UX.
# The ARM template generated isn't published to the live version of the factory. Deployment should be done by using a CI/CD pipeline.
- task: Npm@1
inputs:
command: 'custom'
workingDir: '$(Build.Repository.LocalPath)/${{ parameters.packageJSONFolderPath }}' #replace with the package.json folder
customCommand: 'run build export $(Build.Repository.LocalPath)/${{ parameters.adfRootFolder }} /subscriptions/${{ parameters.subscriptionId }}/resourceGroups/${{ parameters.resourceGroup }}/providers/Microsoft.DataFactory/factories/${{ parameters.adfName }} "ArmTemplate"'
displayName: 'Validate and Generate ARM template'
# Publish the artifact to be used as a source for deploy pipeline.
- task: PublishPipelineArtifact@1
inputs:
targetPath: '$(Build.Repository.LocalPath)/${{ parameters.packageJSONFolderPath }}/ArmTemplate'
artifact: 'ArmTemplates'
publishLocation: 'pipeline'
The above YAML configuration could be saved in a file called “template_build.yml”
Azure dev-ops pipeline to deploy build artifact into specific environment and update triggers (deploy/release pipeline YAML)
parameters:
- name: subscriptionId
type: string
- name: resourceGroup
type: string
- name: adfName
type: string
- name: adfRootFolder
type: string
- name: deployEnvironment
type: string
- name: serviceConnection
type: string
- name: location
type: string
- name: overrideParameters
type: string
jobs:
- deployment: DeployADF
environment: ${{ parameters.deployEnvironment }}
displayName: 'Deploy to ${{ parameters.deployEnvironment }} | ADF: ${{ parameters.adfName }}'
timeoutInMinutes: 120
pool:
vmImage: "ubuntu-latest"
strategy:
runOnce:
deploy:
steps:
- checkout: none
# Retrieve the ARM template from the build phase.
- task: DownloadPipelineArtifact@2
inputs:
buildType: 'current'
artifactName: 'ArmTemplates'
targetPath: '$(Pipeline.Workspace)'
displayName: "Retrieve ARM template"
# Deactivate ADF Triggers before deployment.
# Sample: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-sample-script
- task: AzurePowerShell@5
displayName: Stop ADF Triggers
inputs:
scriptType: 'FilePath'
ConnectedServiceNameARM: ${{ parameters.serviceConnection }}
scriptPath: $(Pipeline.Workspace)/PrePostDeploymentScript.ps1
ScriptArguments: -armTemplate "$(Pipeline.Workspace)/ARMTemplateForFactory.json" -ResourceGroupName ${{ parameters.resourceGroup }} -DataFactoryName ${{ parameters.adfName }} -predeployment $true -deleteDeployment $false
errorActionPreference: stop
FailOnStandardError: False
azurePowerShellVersion: 'LatestVersion'
pwsh: True
# Deploy using the ARM template. Override ARM template parameters as required.
- task: AzureResourceManagerTemplateDeployment@3
displayName: 'Deploy using ARM Template'
inputs:
azureResourceManagerConnection: ${{ parameters.serviceConnection }}
subscriptionId: ${{ parameters.subscriptionId }}
resourceGroupName: ${{ parameters.resourceGroup }}
location: ${{ parameters.location }}
csmFile: '$(Pipeline.Workspace)/ARMTemplateForFactory.json'
csmParametersFile: '$(Pipeline.Workspace)/ARMTemplateParametersForFactory.json'
overrideParameters: ${{ parameters.overrideParameters }}
deploymentMode: 'Incremental'
# Activate ADF Triggers after deployment.
# Sample: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery-sample-script
- task: AzurePowerShell@5
displayName: Start ADF Triggers
inputs:
scriptType: 'FilePath'
ConnectedServiceNameARM: ${{ parameters.serviceConnection }}
scriptPath: $(Pipeline.Workspace)/PrePostDeploymentScript.ps1
ScriptArguments: -ArmTemplate "$(Pipeline.Workspace)/ARMTemplateForFactory.json" -ResourceGroupName ${{ parameters.resourceGroup }} -DataFactoryName ${{ parameters.adfName }} -predeployment $false -deleteDeployment $true
errorActionPreference: stop
FailOnStandardError: False
azurePowerShellVersion: 'LatestVersion'
pwsh: True
The above YAML configuration could be saved in a file called “template_deploy.yml”
NOTE: The powershell scripts are created as part of the build pipeline by the npm package made available by Microsoft.
Joining the pipelines together — Azure dev ops pipeline to build and deploy Azure Data Factory (ADF) pipelines (Build and deploy)
The above release pipeline should follow the build pipeline so that it can retrieve the build artifacts that were created during the build phase. The below dev-ops pipeline ensures that the deploy stage runs only after the successful execution of the build stage. Additionally this deploys to the dev and QA Azure data factories. The deployment to the QA and prod data factories can be controlled by an additional manual approval step by configuring the same in the Azure DevOps environment.
variables:
- name: packageJSONFolderPath
value: build/
- name: adfRootFolder
value: app/adf/
trigger: none
name:
"Build and deploy Azure Data Factory pipelines"
stages:
- stage: Build
displayName: Build
variables:
- template: vars/dev.yml
jobs:
- template: templates/template_build.yml
parameters:
packageJSONFolderPath: ${{ variables.packageJSONFolderPath }}
subscriptionId: ${{ variables.subscriptionId }}
resourceGroup: ${{ variables.resourceGroup }}
adfName: ${{ variables.adfName }}
adfRootFolder: ${{ variables.adfRootFolder }}
- stage: DeployDev
dependsOn: Build
condition: succeeded()
displayName: Deploy ADF pipelines to dev ADF
variables:
- template: vars/dev.yml
jobs:
- template: templates/template_deploy.yml
parameters:
subscriptionId: ${{ variables.subscriptionId }}
resourceGroup: ${{ variables.resourceGroup }}
adfName: ${{ variables.adfName }}
adfRootFolder: ${{ variables.adfRootFolder }}
deployEnvironment: ${{ variables.deployEnvironment }}
serviceConnection: ${{ variables.serviceConnection }}
location: ${{ variables.location }}
overrideParameters: ${{ variables.overrideParameters }}
- stage: DeployQA
displayName: Deploy ADF pipelines to QA ADF
variables:
- template: vars/qa.yml
jobs:
- template: templates/template_deploy.yml
parameters:
subscriptionId: ${{ variables.subscriptionId }}
resourceGroup: ${{ variables.resourceGroup }}
adfName: ${{ variables.adfName }}
adfRootFolder: ${{ variables.adfRootFolder }}
deployEnvironment: ${{ variables.deployEnvironment }}
serviceConnection: ${{ variables.serviceConnection }}
location: ${{ variables.location }}
overrideParameters: ${{ variables.overrideParameters }}
Hope this article could bring together the different resources present in the Microsoft docs on how to set up the new CI/CD flow for Azure Data Factory (ADF) using the NPM package and you are able to set up the same for your ADF projects.