Feature Lifecycle — IDP Module Specification
Implementation guidelines for feature flagging, CI/CD automation, PR workflows, and deployment compliance gates — built on Azure-native services.
Feature Lifecycle Phases
Every feature follows a structured lifecycle from inception to retirement. Each phase maps to specific IDP modules, Azure DevOps pipelines, App Configuration feature flags, and compliance gates. Click a phase below to jump to its explanation.
Lifecycle phases explained
What is done in each phase of the feature lifecycle on Azure.
Author
Code and changes are authored and submitted via pull requests. Branch policies and code review (e.g. in Azure Repos or GitHub) ensure quality and alignment with standards before merging.
Build
Azure DevOps Pipelines (or equivalent) run CI: build, test, and package the application. Artifacts are published; container images may be built and pushed to ACR. Quality and security gates run in the pipeline.
Gate
Pre-deployment compliance checks: Azure Policy, Defender for Cloud findings, and pipeline gates must pass. Deployment windows and approval steps ensure only compliant changes are released.
Flag
Feature flags are used for controlled rollout (e.g. Azure App Configuration Feature Manager). Teams target audiences, set rollout percentages, and can kill-switch or roll back without redeploying.
Retire
Feature flags and related configuration are cleaned up once the feature is fully rolled out or deprecated. Flags are removed from App Configuration and code paths are simplified.
This specification targets Microsoft Azure with Azure DevOps Pipelines for CI/CD, Azure App Configuration Feature Manager for feature flags, AKS for compute, and Azure Policy + Defender for Cloud for compliance gates.
Module Summary
Feature Flagging & Controlled Rollouts
Azure App Configuration Feature Manager, canary/blue-green deployments on AKS, progressive rollout patterns, and kill switches.
CI/CD Pipeline Automation
Multi-stage Azure DevOps YAML pipelines, artifact management, Helm deployments to AKS, Key Vault integration, and pipeline templates.
PR Approval Workflows & Codified CAB
Branch policies, build validation, codified Change Advisory Board with risk-based auto-approval, and quality gates.
Deployment Compliance Gates
Azure Policy enforcement, Defender for Cloud integration, deployment blackout windows, smoke tests, and immutable audit trails.
01 Feature Flagging & Controlled Rollouts
Feature Flag Lifecycle
Every feature flag follows a strict lifecycle managed through Azure App Configuration Feature Manager. Flags transition through four phases:
Azure App Configuration Feature Manager Required
All feature flags are centrally managed in Azure App Configuration using the Feature Manager capability. This provides a single source of truth for flag state across all environments.
- Targeting filter: percentage-based rollouts and user/group targeting
- Time window filter: auto-enable/disable flags within a date range
- Custom filters: evaluate flag state based on arbitrary application context
- Feature flags are environment-scoped — separate App Configuration instances per environment
- Flag changes require PR approval in the configuration-as-code repo
Deployment Strategies on AKS
Canary Deployment (Flagger + Istio)
Use Flagger with Istio service mesh for automated canary analysis. Traffic is gradually shifted from the stable to canary version based on success rate and latency metrics.
- Canary weight progression:
1% → 5% → 25% → 50% → 100% - Automatic rollback if error rate exceeds threshold (default 1%)
- Canary analysis interval: 60 seconds
- Metrics: request success rate, p99 latency from Application Insights
Blue-Green Deployment
Run two identical environments (blue/green) behind Azure Traffic Manager or Nginx Ingress canary annotations. Switch traffic atomically after validation.
- Azure Traffic Manager weighted routing for DNS-level traffic split
- Nginx Ingress
canary-weightannotation for cluster-level routing - Instant rollback by reverting traffic weight to 0
Flag Naming Conventions Required
All feature flags follow the pattern: <team>.<service>.<feature>
| Example | Description |
|---|---|
payments.checkout.new-ui | New checkout UI for payments team |
catalog.search.vector-ranking | Vector-based search ranking experiment |
orders.fulfillment.batch-processing | Batch processing mode for fulfillment |
identity.auth.passkey-login | Passkey authentication experiment |
Flag Hygiene & Governance
- TTL policy: every flag must have a maximum lifespan (default 90 days)
- Stale flag automation: flags exceeding TTL trigger alerts and auto-create cleanup tickets
- Audit logging: all flag state changes recorded in Azure Monitor
- Flag debt dashboard: weekly report of active flags by team, age, and status
Emergency Kill Switches Required
Every feature deployed behind a flag must have a kill switch. Kill switches instantly disable the feature without a deployment. They are toggled via Azure App Configuration REST API or the Azure Portal and take effect within seconds via real-time refresh.
Observability Integration
Correlate feature flag state with application metrics using Application Insights custom dimensions. Every request tagged with active feature flags enables:
- Error rate comparison: flag-on vs. flag-off cohorts
- Performance impact: latency delta per flag
- Business metrics: conversion rate per variant
- Custom KQL queries in Log Analytics for flag-based analysis
Terraform: Azure App Configuration with Feature Flags
# modules/feature-flags/main.tf
resource "azurerm_app_configuration" "flags" {
name = "appconf-${var.environment}-${var.service_name}"
resource_group_name = var.resource_group_name
location = var.location
sku = "standard"
identity {
type = "SystemAssigned"
}
tags = var.common_tags
}
resource "azurerm_app_configuration_feature" "feature" {
for_each = { for f in var.feature_flags : f.name => f }
configuration_store_id = azurerm_app_configuration.flags.id
name = each.value.name
label = var.environment
enabled = each.value.enabled
description = each.value.description
dynamic "targeting_filter" {
for_each = each.value.targeting != null ? [each.value.targeting] : []
content {
default_rollout_percentage = targeting_filter.value.percentage
dynamic "groups" {
for_each = targeting_filter.value.groups
content {
name = groups.value.name
rollout_percentage = groups.value.percentage
}
}
}
}
dynamic "timewindow_filter" {
for_each = each.value.time_window != null ? [each.value.time_window] : []
content {
start = timewindow_filter.value.start
end = timewindow_filter.value.end
}
}
}
# Grant AKS managed identity read access
resource "azurerm_role_assignment" "aks_reader" {
scope = azurerm_app_configuration.flags.id
role_definition_name = "App Configuration Data Reader"
principal_id = var.aks_managed_identity_principal_id
}
Azure DevOps Pipeline: Canary Deployment with Flagger
# pipelines/canary-deploy.yaml
trigger:
branches:
include: [main]
pool:
vmImage: 'ubuntu-latest'
variables:
- group: acr-credentials
- name: acrName
value: 'acmeacr'
- name: imageRepository
value: '$(Build.Repository.Name)'
- name: tag
value: '$(Build.BuildId)'
stages:
- stage: Build
displayName: 'Build & Push Image'
jobs:
- job: BuildImage
steps:
- task: Docker@2
displayName: 'Build and push to ACR'
inputs:
containerRegistry: 'acr-service-connection'
repository: '$(imageRepository)'
command: 'buildAndPush'
Dockerfile: '**/Dockerfile'
tags: |
$(tag)
latest
- stage: DeployCanary
displayName: 'Deploy Canary to AKS'
dependsOn: Build
jobs:
- deployment: CanaryDeploy
environment: 'aks-production'
strategy:
runOnce:
deploy:
steps:
- task: KubernetesManifest@1
displayName: 'Update canary image'
inputs:
action: 'set-image'
kubernetesServiceConnection: 'aks-prod'
namespace: '$(team)'
containers: |
$(acrName).azurecr.io/$(imageRepository):$(tag)
manifests: |
k8s/deployment.yaml
- stage: ValidateCanary
displayName: 'Validate Canary Health'
dependsOn: DeployCanary
jobs:
- job: HealthCheck
steps:
- script: |
echo "Waiting for Flagger canary analysis..."
kubectl -n $(team) wait canary/$(service) \
--for=condition=promoted --timeout=600s
displayName: 'Wait for Flagger promotion'
- script: |
CANARY_STATUS=$(kubectl -n $(team) get canary/$(service) \
-o jsonpath='{.status.phase}')
if [ "$CANARY_STATUS" != "Succeeded" ]; then
echo "##vso[task.logissue type=error]Canary failed: $CANARY_STATUS"
exit 1
fi
displayName: 'Verify canary status'
Blue-Green Deployment Pipeline
# pipelines/blue-green-deploy.yaml
stages:
- stage: DeployGreen
displayName: 'Deploy to Green Slot'
jobs:
- deployment: GreenDeploy
environment: 'aks-production.green'
strategy:
runOnce:
deploy:
steps:
- task: HelmDeploy@0
displayName: 'Helm upgrade green'
inputs:
connectionType: 'Kubernetes Service Connection'
kubernetesServiceConnection: 'aks-prod'
namespace: '$(team)'
command: 'upgrade'
chartType: 'FilePath'
chartPath: 'charts/$(service)'
releaseName: '$(service)-green'
overrideValues: |
image.tag=$(tag)
slot=green
- stage: SmokeTestGreen
displayName: 'Smoke Test Green'
dependsOn: DeployGreen
jobs:
- job: SmokeTest
steps:
- script: |
curl -sf http://$(service)-green.$(team).svc.cluster.local/health
displayName: 'Health check green'
- stage: SwitchTraffic
displayName: 'Switch Traffic to Green'
dependsOn: SmokeTestGreen
jobs:
- deployment: TrafficSwitch
environment: 'aks-production'
strategy:
runOnce:
deploy:
steps:
- script: |
kubectl -n $(team) patch ingress $(service) \
-p '{"metadata":{"annotations":{
"nginx.ingress.kubernetes.io/canary":"true",
"nginx.ingress.kubernetes.io/canary-weight":"100"
}}}'
displayName: 'Route 100% to green'
.NET SDK Integration: Microsoft.FeatureManagement
// src/Startup.cs
using Microsoft.FeatureManagement;
using Microsoft.FeatureManagement.FeatureFilters;
public class Startup
{
public void ConfigureServices(IServiceCollection services)
{
// Connect to Azure App Configuration
services.AddAzureAppConfiguration();
// Add feature management with targeting
services.AddFeatureManagement()
.AddFeatureFilter<TargetingFilter>()
.AddFeatureFilter<TimeWindowFilter>()
.AddFeatureFilter<PercentageFilter>();
// Register targeting context accessor
services.AddSingleton<ITargetingContextAccessor,
HttpContextTargetingContextAccessor>();
}
public void Configure(IApplicationBuilder app)
{
// Enable dynamic configuration refresh
app.UseAzureAppConfiguration();
}
}
// Usage in a controller
[ApiController]
[Route("api/[controller]")]
public class CheckoutController : ControllerBase
{
private readonly IFeatureManager _featureManager;
public CheckoutController(IFeatureManager featureManager)
{
_featureManager = featureManager;
}
[HttpGet]
public async Task<IActionResult> GetCheckout()
{
if (await _featureManager
.IsEnabledAsync("payments.checkout.new-ui"))
{
return Ok(new { version = "v2", ui = "new" });
}
return Ok(new { version = "v1", ui = "classic" });
}
}
Python SDK Integration
# src/feature_flags.py
from azure.appconfiguration.provider import load
from azure.identity import DefaultAzureCredential
from featuremanagement import FeatureManager
credential = DefaultAzureCredential()
# Load configuration with feature flags
config = load(
endpoint="https://appconf-prod-checkout.azconfig.io",
credential=credential,
feature_flag_enabled=True,
feature_flag_refresh_enabled=True,
refresh_interval=30, # seconds
)
feature_manager = FeatureManager(config)
# Check flag state
async def get_checkout_experience(user_id: str):
"""Return checkout experience based on feature flag."""
context = {"user_id": user_id, "groups": ["beta-testers"]}
if feature_manager.is_enabled(
"payments.checkout.new-ui", context
):
return {"version": "v2", "ui": "new"}
return {"version": "v1", "ui": "classic"}
Flagger Canary Resource
# k8s/canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: checkout-service
namespace: payments
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: checkout-service
service:
port: 8080
targetPort: 8080
gateways:
- istio-system/public-gateway
hosts:
- checkout.acme.com
analysis:
interval: 60s
threshold: 5
maxWeight: 50
stepWeight: 5
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 60s
- name: request-duration
thresholdRange:
max: 500
interval: 60s
webhooks:
- name: load-test
type: rollout
url: http://flagger-loadtester.istio-system/
metadata:
cmd: "hey -z 60s -q 10 -c 2 http://checkout-service-canary.payments:8080/"
Azure Traffic Manager: Weighted Routing
# modules/traffic-manager/main.tf
resource "azurerm_traffic_manager_profile" "canary" {
name = "tm-${var.service_name}-canary"
resource_group_name = var.resource_group_name
traffic_routing_method = "Weighted"
dns_config {
relative_name = var.service_name
ttl = 30
}
monitor_config {
protocol = "HTTPS"
port = 443
path = "/health"
interval_in_seconds = 10
timeout_in_seconds = 5
tolerated_number_of_failures = 2
}
tags = var.common_tags
}
resource "azurerm_traffic_manager_azure_endpoint" "stable" {
name = "stable"
profile_id = azurerm_traffic_manager_profile.canary.id
target_resource_id = var.stable_public_ip_id
weight = var.stable_weight # e.g., 95
}
resource "azurerm_traffic_manager_azure_endpoint" "canary" {
name = "canary"
profile_id = azurerm_traffic_manager_profile.canary.id
target_resource_id = var.canary_public_ip_id
weight = var.canary_weight # e.g., 5
}
Glossary
| Term | Definition |
|---|---|
| Feature Flag | A runtime toggle that controls feature visibility without code deployment. Stored in Azure App Configuration. |
| Feature Filter | A rule that determines whether a flag is enabled for a given context (targeting, time window, custom). |
| Targeting | Evaluating flag state based on user identity, group membership, or percentage rollout. |
| Canary Deployment | Gradually shifting traffic to a new version while monitoring health metrics. Uses Flagger + Istio on AKS. |
| Blue-Green Deployment | Running two identical environments and switching traffic atomically between them. |
| Progressive Rollout | Incrementally increasing the percentage of users exposed to a feature: 1% → 5% → 25% → 50% → 100%. |
| Kill Switch | An emergency flag disable mechanism that takes effect within seconds via App Configuration refresh. |
| Flag Debt | Accumulated feature flags that have outlived their purpose but remain in code and configuration. |
| Variant | A specific configuration value associated with a feature flag, enabling A/B testing with multiple options. |
| A/B Test | An experiment comparing two or more variants of a feature using statistical analysis. |
| Azure App Configuration | Centralized Azure service for managing application settings and feature flags. |
| Feature Manager | The App Configuration capability that provides feature flag evaluation with filters. |
| Flagger | A Kubernetes operator that automates canary, A/B testing, and blue-green deployments. |
| Istio | An open-source service mesh providing traffic management, security, and observability for Kubernetes. |
| Traffic Manager | Azure DNS-based traffic load balancer supporting weighted, priority, and geographic routing. |
| Application Insights | Azure APM service for monitoring live applications, detecting anomalies, and diagnosing issues. |
Component: FeatureFlagDashboard
Screen Layout: A data table listing all feature flags with columns for name, status (active/canary/retired), targeting rules, rollout percentage progress bar, environment scope, TTL remaining, and last modified date. Filter controls for team, status, and environment. Bulk actions for enable/disable/archive. Each row expandable to show targeting details and flag audit history.
Component: CanaryMonitor
Screen Layout: Real-time dashboard showing active canary deployments. Each canary displays a progress gauge (current weight %), request success rate chart, p99 latency sparkline, and Flagger phase indicator (Initializing → Progressing → Promoting → Succeeded/Failed). Error budget burn-down visualization.
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /api/flags | List all feature flags with state and targeting |
POST | /api/flags | Create a new feature flag |
PUT | /api/flags/{name} | Update flag state or targeting rules |
DELETE | /api/flags/{name} | Retire and remove a feature flag |
GET | /api/canary/{service} | Get canary deployment status and metrics |
POST | /api/canary/{service}/rollback | Trigger immediate canary rollback |
Mock Data
{
"feature_flags": [
{ "name": "payments.checkout.new-ui", "status": "active", "rollout_pct": 50, "env": "production", "ttl_days_remaining": 45, "created": "2025-12-01" },
{ "name": "catalog.search.vector-ranking", "status": "canary", "rollout_pct": 5, "env": "production", "ttl_days_remaining": 82, "created": "2025-12-15" },
{ "name": "orders.fulfillment.batch-processing", "status": "active", "rollout_pct": 100, "env": "production", "ttl_days_remaining": 12, "created": "2025-10-01" },
{ "name": "identity.auth.passkey-login", "status": "active", "rollout_pct": 25, "env": "staging", "ttl_days_remaining": 67, "created": "2025-12-20" },
{ "name": "payments.refunds.instant-refund", "status": "canary", "rollout_pct": 1, "env": "production", "ttl_days_remaining": 88, "created": "2026-01-05" },
{ "name": "catalog.recommendations.ml-v3", "status": "retired", "rollout_pct": 0, "env": "all", "ttl_days_remaining": 0, "created": "2025-08-01" },
{ "name": "orders.tracking.real-time-map", "status": "active", "rollout_pct": 75, "env": "production", "ttl_days_remaining": 30, "created": "2025-11-15" },
{ "name": "identity.mfa.biometric-prompt", "status": "active", "rollout_pct": 10, "env": "staging", "ttl_days_remaining": 60, "created": "2025-12-28" },
{ "name": "payments.pricing.dynamic-discount", "status": "canary", "rollout_pct": 5, "env": "production", "ttl_days_remaining": 85, "created": "2026-01-10" },
{ "name": "catalog.images.webp-optimization", "status": "retired", "rollout_pct": 0, "env": "all", "ttl_days_remaining": 0, "created": "2025-07-01" }
]
}
02 CI/CD Pipeline Automation
Pipeline Architecture Required
All CI/CD pipelines use Azure DevOps multi-stage YAML pipelines stored in the application repository. Pipelines follow a four-stage model:
Branch Strategy
Trunk-based development with short-lived feature branches. Release branches for hotfixes only.
- main — production-ready trunk, CI triggers on merge
- feature/* — short-lived branches (< 3 days), PR required for merge
- release/* — created from main for hotfix releases only
- No long-lived develop branches — feature flags replace them
Pipeline Triggers & Approvals
- CI trigger: automatic on PR merge to main
- Scheduled builds: nightly for integration tests and security scans
- Environment approvals: staging auto-deploys; production requires manual approval
- Approval checks: branch protection, required reviews, artifact verification
Artifact Management
Use Azure Artifacts for package feeds (NuGet, npm) and Azure Container Registry (ACR) for Docker images.
- All artifacts are versioned with build ID and Git SHA
- ACR geo-replicated to eastus2 and westus2 for redundancy
- Image vulnerability scanning enabled via Defender for Containers
- Retention policy: keep last 30 production images, purge untagged after 7 days
Secret Management in Pipelines Required
Never store secrets as pipeline variables. Use variable groups linked to Azure Key Vault to inject secrets at runtime. Service connections use workload identity federation (no client secrets).
Pipeline Optimization
- Caching: cache NuGet/npm packages, Docker layers between builds
- Parallel jobs: run unit tests and lint checks in parallel
- Shared templates: central pipeline template repo to avoid duplication
- Notifications: Microsoft Teams channel integration for build status
Multi-Stage Pipeline: .NET Microservice Required
# pipelines/dotnet-microservice.yaml
trigger:
branches:
include: [main]
pr:
branches:
include: [main]
variables:
- group: keyvault-secrets # Linked to Azure Key Vault
- name: buildConfiguration
value: 'Release'
- name: acrName
value: 'acmeacr'
stages:
- stage: Build
displayName: 'Build & Unit Test'
jobs:
- job: BuildAndTest
pool:
vmImage: 'ubuntu-latest'
steps:
- task: UseDotNet@2
inputs:
version: '8.x'
- task: Cache@2
inputs:
key: 'nuget | "$(Agent.OS)" | **/packages.lock.json'
restoreKeys: 'nuget | "$(Agent.OS)"'
path: '$(NUGET_PACKAGES)'
- script: dotnet restore
displayName: 'Restore packages'
- script: dotnet build --configuration $(buildConfiguration) --no-restore
displayName: 'Build'
- script: dotnet test --configuration $(buildConfiguration) --no-build --collect:"XPlat Code Coverage" --results-directory $(Agent.TempDirectory)/coverage
displayName: 'Run unit tests'
- task: PublishCodeCoverageResults@2
inputs:
summaryFileLocation: '$(Agent.TempDirectory)/coverage/**/coverage.cobertura.xml'
- stage: SecurityScan
displayName: 'Security & Quality'
dependsOn: Build
jobs:
- job: SonarQube
pool:
vmImage: 'ubuntu-latest'
steps:
- task: SonarQubePrepare@5
inputs:
SonarQube: 'sonarqube-connection'
scannerMode: 'MSBuild'
projectKey: '$(Build.Repository.Name)'
- script: dotnet build --configuration $(buildConfiguration)
displayName: 'Build for analysis'
- task: SonarQubeAnalyze@5
- task: SonarQubePublish@5
inputs:
pollingTimeoutSec: '300'
- job: ContainerScan
pool:
vmImage: 'ubuntu-latest'
steps:
- task: Docker@2
displayName: 'Build image'
inputs:
command: 'build'
Dockerfile: '**/Dockerfile'
tags: 'scan-$(Build.BuildId)'
- script: |
trivy image --exit-code 1 --severity HIGH,CRITICAL \
$(acrName).azurecr.io/$(Build.Repository.Name):scan-$(Build.BuildId)
displayName: 'Trivy vulnerability scan'
- stage: PushImage
displayName: 'Push to ACR'
dependsOn: SecurityScan
jobs:
- job: Push
pool:
vmImage: 'ubuntu-latest'
steps:
- task: Docker@2
inputs:
containerRegistry: 'acr-service-connection'
repository: '$(Build.Repository.Name)'
command: 'buildAndPush'
tags: |
$(Build.BuildId)
$(Build.SourceVersion)
- stage: DeployStaging
displayName: 'Deploy to Staging'
dependsOn: PushImage
jobs:
- deployment: StagingDeploy
environment: 'aks-staging'
strategy:
runOnce:
deploy:
steps:
- task: HelmDeploy@0
inputs:
connectionType: 'Kubernetes Service Connection'
kubernetesServiceConnection: 'aks-staging'
namespace: '$(team)'
command: 'upgrade'
chartType: 'FilePath'
chartPath: 'charts/$(service)'
releaseName: '$(service)'
overrideValues: 'image.tag=$(Build.BuildId)'
valueFile: 'charts/$(service)/values-staging.yaml'
- stage: DeployProduction
displayName: 'Deploy to Production'
dependsOn: DeployStaging
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
jobs:
- deployment: ProductionDeploy
environment: 'aks-production' # Has approval gate
strategy:
runOnce:
deploy:
steps:
- task: HelmDeploy@0
inputs:
connectionType: 'Kubernetes Service Connection'
kubernetesServiceConnection: 'aks-production'
namespace: '$(team)'
command: 'upgrade'
chartType: 'FilePath'
chartPath: 'charts/$(service)'
releaseName: '$(service)'
overrideValues: 'image.tag=$(Build.BuildId)'
valueFile: 'charts/$(service)/values-production.yaml'
Shared Pipeline Template (Extends Pattern)
# templates/build-test-deploy.yaml
# Shared template — consumed via "extends" pattern
parameters:
- name: service
type: string
- name: team
type: string
- name: language
type: string
values: ['dotnet', 'node', 'python']
- name: chartPath
type: string
default: 'charts/$(service)'
stages:
- stage: Build
jobs:
- job: Build
steps:
- template: steps/${{ parameters.language }}-build.yaml
- stage: Test
dependsOn: Build
jobs:
- job: QualityGate
steps:
- template: steps/sonarqube-scan.yaml
- template: steps/container-scan.yaml
- stage: DeployStaging
dependsOn: Test
jobs:
- deployment: Staging
environment: 'aks-staging'
strategy:
runOnce:
deploy:
steps:
- template: steps/helm-deploy.yaml
parameters:
environment: staging
service: ${{ parameters.service }}
team: ${{ parameters.team }}
# Consumer pipeline (azure-pipelines.yaml in service repo):
# extends:
# template: templates/build-test-deploy.yaml@pipeline-templates
# parameters:
# service: checkout-service
# team: payments
# language: dotnet
Terraform Pipeline: Plan on PR, Apply on Merge
# pipelines/terraform.yaml
trigger:
branches:
include: [main]
paths:
include: ['infra/**']
pr:
branches:
include: [main]
paths:
include: ['infra/**']
stages:
- stage: Plan
displayName: 'Terraform Plan'
jobs:
- job: TerraformPlan
pool:
vmImage: 'ubuntu-latest'
steps:
- task: TerraformInstaller@1
inputs:
terraformVersion: '1.7.x'
- task: TerraformCLI@1
displayName: 'Init'
inputs:
command: 'init'
workingDirectory: 'infra/$(environment)'
backendType: 'azurerm'
backendServiceArm: 'terraform-sp'
backendAzureRmResourceGroupName: 'rg-terraform-state'
backendAzureRmStorageAccountName: 'stterraformstate'
backendAzureRmContainerName: 'tfstate'
backendAzureRmKey: '$(environment)/terraform.tfstate'
- task: TerraformCLI@1
displayName: 'Plan'
inputs:
command: 'plan'
workingDirectory: 'infra/$(environment)'
environmentServiceName: 'terraform-sp'
commandOptions: '-out=tfplan'
- task: TerraformCLI@1
displayName: 'Show Plan'
inputs:
command: 'show'
workingDirectory: 'infra/$(environment)'
inputTargetPlanOrStateFilePath: 'infra/$(environment)/tfplan'
- stage: Apply
displayName: 'Terraform Apply'
dependsOn: Plan
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
jobs:
- deployment: TerraformApply
environment: 'infra-$(environment)' # Approval gate
strategy:
runOnce:
deploy:
steps:
- task: TerraformCLI@1
displayName: 'Apply'
inputs:
command: 'apply'
workingDirectory: 'infra/$(environment)'
environmentServiceName: 'terraform-sp'
commandOptions: 'tfplan'
Variable Group Linked to Key Vault
# In Azure DevOps UI or REST API:
# Variable group "keyvault-secrets" linked to Key Vault "kv-prod-checkout"
# Secrets auto-mapped: db-connection-string, api-key, jwt-secret
# Pipeline usage:
variables:
- group: keyvault-secrets
steps:
- script: |
# Secrets are available as environment variables
echo "Connection verified"
dotnet run --urls "http://+:8080"
env:
ConnectionStrings__Default: $(db-connection-string)
ApiKey: $(api-key)
Jwt__Secret: $(jwt-secret)
displayName: 'Run with Key Vault secrets'
Node.js Microservice Pipeline
# pipelines/node-microservice.yaml
stages:
- stage: Build
jobs:
- job: BuildAndTest
pool:
vmImage: 'ubuntu-latest'
steps:
- task: NodeTool@0
inputs:
versionSpec: '20.x'
- task: Cache@2
inputs:
key: 'npm | "$(Agent.OS)" | package-lock.json'
path: '$(npm_config_cache)'
- script: npm ci
displayName: 'Install dependencies'
- script: npm run lint
displayName: 'Lint'
- script: npm run test:coverage
displayName: 'Run tests with coverage'
- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: 'junit.xml'
- script: npm run build
displayName: 'Build'
- task: Docker@2
displayName: 'Build & push to ACR'
inputs:
containerRegistry: 'acr-service-connection'
repository: '$(Build.Repository.Name)'
command: 'buildAndPush'
tags: '$(Build.BuildId)'
Glossary
| Term | Definition |
|---|---|
| CI/CD | Continuous Integration / Continuous Delivery — automated build, test, and deployment pipeline. |
| Pipeline Stage | A logical boundary in a pipeline (Build, Test, Deploy) that can have its own agent pool and conditions. |
| Job | A unit of work within a stage that runs on a single agent. |
| Step | The smallest unit of execution — a script or task within a job. |
| Task | A pre-built Azure DevOps building block (e.g., Docker@2, HelmDeploy@0). |
| Environment | An Azure DevOps resource representing a deployment target with approval gates. |
| Approval Gate | A manual or automated check that must pass before a pipeline stage can execute. |
| Variable Group | A named set of variables that can be linked to Azure Key Vault for secret injection. |
| Service Connection | An Azure DevOps resource providing authenticated access to external services (Azure, ACR, K8s). |
| Workload Identity Federation | Secretless authentication using federated credentials — no client secret or certificate stored. |
| Pipeline Template | Reusable YAML defining stages, jobs, or steps that can be consumed via extends or template references. |
| Extends Template | A pipeline pattern where the consumer pipeline inherits stages from a shared template. |
| Azure Artifacts | Package management service for NuGet, npm, Maven, Python, and Universal Packages. |
| ACR Task | Azure Container Registry's built-in build engine for building images without a local Docker daemon. |
| Helm Release | A deployed instance of a Helm chart on a Kubernetes cluster. |
Component: PipelineDashboard
Screen Layout: A table of recent pipeline runs with columns for pipeline name, trigger (CI/manual/scheduled), branch, stages with colored status indicators (green=succeeded, yellow=running, red=failed, gray=skipped), duration, and actor. Click a row to see stage details with individual step logs. Filter by team, status, and date range. Summary cards at top showing: total runs today, success rate, average duration, and deployments count.
Component: DeploymentTimeline
Screen Layout: Horizontal swimlane timeline with rows per environment (Dev, Staging, Production). Each deployment is a colored marker on the timeline showing service name, version, and timestamp. Hover for deployment details. Vertical lines connect promotions across environments.
Mock Data
{
"pipeline_runs": [
{ "id": 1042, "pipeline": "checkout-service", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 14, "actor": "jsmith" },
{ "id": 1041, "pipeline": "catalog-api", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "pending"}, "duration_min": 11, "actor": "adoe" },
{ "id": 1040, "pipeline": "order-service", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "failed", "staging": "skipped", "production": "skipped"}, "duration_min": 6, "actor": "mchen" },
{ "id": 1039, "pipeline": "identity-service", "trigger": "manual", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 18, "actor": "kpatel" },
{ "id": 1038, "pipeline": "infra-terraform", "trigger": "ci", "branch": "main", "stages": {"plan": "succeeded", "apply": "succeeded"}, "duration_min": 8, "actor": "platform-bot" },
{ "id": 1037, "pipeline": "checkout-service", "trigger": "ci", "branch": "feature/new-cart", "stages": {"build": "succeeded", "test": "succeeded", "staging": "skipped", "production": "skipped"}, "duration_min": 9, "actor": "jsmith" },
{ "id": 1036, "pipeline": "payment-processor", "trigger": "scheduled", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "running"}, "duration_min": 12, "actor": "scheduler" },
{ "id": 1035, "pipeline": "notification-svc", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 10, "actor": "rlee" }
],
"summary": { "runs_today": 24, "success_rate": 87.5, "avg_duration_min": 12.3, "deployments_today": 6 }
}
03 PR Approval Workflows & Codified CAB
Branch Policy Configuration Required
All repositories enforce branch policies on main:
- Minimum reviewers: 2 approvals required (1 for documentation-only changes)
- Required reviewers by path: auto-assigned based on file path patterns (CODEOWNERS equivalent)
- Build validation: CI pipeline must pass before merge is allowed
- Comment resolution: all PR comments must be resolved
- Work item linking: every PR linked to an Azure Boards work item
Status Checks
- SonarQube quality gate: must pass (no new bugs, 80% coverage on new code)
- Defender for DevOps: security scan for secrets, vulnerabilities, and IaC misconfigurations
- License compliance: third-party license audit (no GPL in proprietary code)
- Build validation pipeline: compile + unit tests must succeed
Merge Strategies
| Branch Type | Strategy | Rationale |
|---|---|---|
feature/* → main | Squash merge | Clean linear history, one commit per feature |
release/* → main | Merge commit | Preserve release branch history for audit |
hotfix/* → main | Merge commit | Preserve hotfix context |
Code Review SLA
| Priority | Response Time | Approval Time |
|---|---|---|
| P1 — Critical | < 1 hour | < 4 hours |
| P2 — High | < 4 hours | < 8 hours |
| P3 — Standard | < 8 hours | < 24 hours |
Codified CAB (Change Advisory Board) Required
The CAB process is automated through Azure DevOps pipeline environment approvals, replacing manual meetings for most changes.
Risk Assessment
Every PR is automatically scored for change risk based on:
- Files changed: infrastructure/security files score higher
- Blast radius: number of downstream services affected
- Service tier: Tier-1 (customer-facing) changes score higher than Tier-3 (internal tooling)
- Change size: large diffs score higher
Auto-Approval Rules
| Risk Level | Score | Approval |
|---|---|---|
| Low | 0–30 | Auto-approved (documentation, config, non-Tier-1) |
| Medium | 31–70 | Single approver from service owner group |
| High | 71–100 | Multi-approver: service owner + platform team + security |
CAB Automation
- Auto-generate change summary from PR title, description, and file diff stats
- Integration with ServiceNow via Azure DevOps Service Hooks for change record creation
- Post-approval: deployment pipeline triggered automatically
- Audit trail: all CAB decisions logged with risk score, approvers, and timestamps
Quality Gates
- Unit test coverage: ≥80% on new code, ≥60% overall
- Integration test coverage: ≥60% for service boundary tests
- Complexity limits: cyclomatic complexity < 15 per method
- Duplicated lines: < 3% duplication in new code
PR Template
## Summary
## Type of Change
- [ ] Feature (new functionality)
- [ ] Bug fix
- [ ] Infrastructure / Config
- [ ] Documentation
- [ ] Refactor (no behavior change)
## Checklist
- [ ] Unit tests added/updated (coverage ≥80%)
- [ ] Integration tests added (if applicable)
- [ ] Documentation updated
- [ ] Feature flag created (if new feature)
- [ ] Runbook updated (if operational change)
- [ ] Security review (if auth/data changes)
- [ ] Database migration tested (if schema change)
## Risk Assessment
- **Blast radius:** [Low / Medium / High]
- **Service tier:** [Tier-1 / Tier-2 / Tier-3]
- **Rollback plan:** [Describe rollback strategy]
## Related Work Items
AB#
Change Risk Assessment Script
#!/usr/bin/env python3
"""Automated change risk assessment for Azure DevOps PRs."""
import json
import os
import sys
from dataclasses import dataclass
HIGH_RISK_PATHS = [
"infra/", "terraform/", "k8s/", "charts/",
"Dockerfile", "docker-compose",
".github/", ".azuredevops/", "pipelines/",
"migrations/", "security/", "auth/",
]
TIER_1_SERVICES = [
"checkout-service", "payment-processor",
"identity-service", "order-service",
]
@dataclass
class RiskScore:
total: int
level: str # low, medium, high
factors: list
def assess_risk(
changed_files: list[str],
lines_changed: int,
service_name: str,
downstream_count: int,
) -> RiskScore:
score = 0
factors = []
# File path risk
infra_files = [f for f in changed_files
if any(f.startswith(p) for p in HIGH_RISK_PATHS)]
if infra_files:
score += 30
factors.append(f"Infrastructure files: {len(infra_files)}")
# Blast radius
if downstream_count > 5:
score += 25
factors.append(f"High blast radius: {downstream_count} services")
elif downstream_count > 2:
score += 15
factors.append(f"Medium blast radius: {downstream_count} services")
# Service tier
if service_name in TIER_1_SERVICES:
score += 20
factors.append(f"Tier-1 service: {service_name}")
# Change size
if lines_changed > 500:
score += 15
factors.append(f"Large change: {lines_changed} lines")
elif lines_changed > 200:
score += 10
factors.append(f"Medium change: {lines_changed} lines")
# Database migrations
migration_files = [f for f in changed_files if "migration" in f.lower()]
if migration_files:
score += 20
factors.append(f"Database migration: {len(migration_files)} files")
level = "low" if score <= 30 else "medium" if score <= 70 else "high"
return RiskScore(total=min(score, 100), level=level, factors=factors)
if __name__ == "__main__":
result = assess_risk(
changed_files=json.loads(os.environ.get("CHANGED_FILES", "[]")),
lines_changed=int(os.environ.get("LINES_CHANGED", "0")),
service_name=os.environ.get("SERVICE_NAME", ""),
downstream_count=int(os.environ.get("DOWNSTREAM_COUNT", "0")),
)
print(json.dumps({
"score": result.total,
"level": result.level,
"factors": result.factors,
}, indent=2))
# Set Azure DevOps variable for pipeline gates
print(f"##vso[task.setvariable variable=riskLevel]{result.level}")
print(f"##vso[task.setvariable variable=riskScore]{result.total}")
Build Validation Pipeline
# pipelines/pr-validation.yaml
# Triggered as a build validation policy on the main branch
trigger: none
pr:
branches:
include: [main]
pool:
vmImage: 'ubuntu-latest'
steps:
- script: dotnet restore && dotnet build --configuration Release
displayName: 'Build'
- script: dotnet test --no-build --collect:"XPlat Code Coverage"
displayName: 'Unit tests'
- task: SonarQubePrepare@5
inputs:
SonarQube: 'sonarqube-connection'
scannerMode: 'MSBuild'
projectKey: '$(Build.Repository.Name)'
- task: SonarQubeAnalyze@5
- task: SonarQubePublish@5
# Risk assessment
- script: |
CHANGED=$(git diff --name-only origin/main...HEAD)
LINES=$(git diff --stat origin/main...HEAD | tail -1 | awk '{print $4+$6}')
export CHANGED_FILES=$(echo "$CHANGED" | jq -R . | jq -s .)
export LINES_CHANGED=$LINES
export SERVICE_NAME=$(basename $(pwd))
python scripts/risk-assessment.py
displayName: 'Risk assessment'
- script: |
echo "Risk Level: $(riskLevel)"
echo "Risk Score: $(riskScore)"
echo "##vso[task.addattachment type=risk-assessment;name=risk]$(riskLevel):$(riskScore)"
displayName: 'Report risk score'
ServiceNow Integration via Service Hook
{
"publisherId": "tfs",
"eventType": "ms.vss-release.deployment-approval-pending-event",
"consumerActionId": "httpRequest",
"consumerInputs": {
"url": "https://acme.service-now.com/api/sn_chg_rest/change",
"httpHeaders": "Content-Type: application/json\nAuthorization: Bearer {{token}}",
"resourceDetailsToSend": "all",
"messagesToSend": "all",
"detailedMessagesToSend": "all"
},
"publisherInputs": {
"releaseEnvironmentId": "production",
"releaseDefinitionId": "",
"projectId": "{{projectId}}"
}
}
Codified CAB: Environment with Approval Gates
# Deployment stage with codified CAB approval
- stage: ProductionDeploy
displayName: 'Production Deployment (CAB Gated)'
dependsOn: StagingValidation
jobs:
- deployment: CabApproval
# Environment 'aks-production' configured with:
# - Approval: service-owners group + platform-team
# - Branch control: only from refs/heads/main
# - Business hours: Mon-Fri 06:00-18:00 CST
# - Exclusive lock: one deployment at a time
environment: 'aks-production'
strategy:
runOnce:
deploy:
steps:
- script: |
if [ "$(riskLevel)" = "low" ]; then
echo "Low-risk change — auto-approved by codified CAB"
fi
displayName: 'CAB decision log'
- task: HelmDeploy@0
inputs:
connectionType: 'Kubernetes Service Connection'
kubernetesServiceConnection: 'aks-production'
namespace: '$(team)'
command: 'upgrade'
chartType: 'FilePath'
chartPath: 'charts/$(service)'
releaseName: '$(service)'
overrideValues: 'image.tag=$(Build.BuildId)'
Glossary
| Term | Definition |
|---|---|
| Branch Policy | Azure DevOps Repos configuration enforcing rules on branch operations (merge requirements, build validation). |
| Build Validation | A required pipeline run triggered on every PR that must succeed before merge is permitted. |
| Required Reviewer | An auto-assigned reviewer based on file path patterns, ensuring domain experts review relevant changes. |
| Status Check | An external validation (SonarQube, security scan) that reports pass/fail to the PR. |
| Squash Merge | Combining all commits in a feature branch into a single commit on the target branch. |
| PR Template | A markdown template pre-populating PR descriptions with checklists and required fields. |
| CODEOWNERS | File path–based automatic reviewer assignment (Azure DevOps uses "Required Reviewer" policies). |
| Codified CAB | An automated Change Advisory Board that uses pipeline gates and risk scores instead of manual meetings. |
| Change Risk Assessment | Automated scoring of a change's risk level based on file paths, blast radius, and service tier. |
| Blast Radius | The number of downstream services or users affected if the change introduces a defect. |
| Change Record | A formal record in ServiceNow/ITSM documenting the change, its risk, and approval history. |
| Service Hook | Azure DevOps webhook that sends event notifications to external services (ServiceNow, Slack). |
| Quality Gate | A set of measurable thresholds (coverage, complexity) that code must meet to proceed. |
| Technical Debt Score | A SonarQube metric estimating the effort required to fix all code maintainability issues. |
Component: PRWorkflowBoard
Screen Layout: Kanban-style board with columns: Draft → In Review → Changes Requested → Approved → Merged. Each PR card shows title, author avatar, reviewer status (pending/approved/rejected), quality gate results (SonarQube, security scan), and age indicator. Cards have colored borders based on risk level (green=low, yellow=medium, red=high).
Component: CABDashboard
Screen Layout: Queue of pending change requests with columns: service, change summary, risk score (color-coded), requested by, approval status, and scheduled deployment window. History section showing past approvals with auto-approve rate percentage. Pie chart breakdown: auto-approved vs. manual approval vs. rejected.
Mock Data
{
"pull_requests": [
{ "id": 1234, "title": "Add passkey authentication flow", "author": "jsmith", "status": "in_review", "reviewers": [{"name": "adoe", "status": "approved"}, {"name": "mchen", "status": "pending"}], "risk_score": 65, "risk_level": "medium", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "82%"} },
{ "id": 1235, "title": "Update Helm chart resource limits", "author": "kpatel", "status": "approved", "reviewers": [{"name": "platform-team", "status": "approved"}], "risk_score": 45, "risk_level": "medium", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "N/A"} },
{ "id": 1236, "title": "Fix checkout total calculation", "author": "adoe", "status": "changes_requested", "reviewers": [{"name": "jsmith", "status": "changes_requested"}], "risk_score": 55, "risk_level": "medium", "quality_gates": {"sonarqube": "failed", "security": "passed", "coverage": "75%"} },
{ "id": 1237, "title": "Update README badges", "author": "rlee", "status": "merged", "reviewers": [{"name": "auto-approve", "status": "approved"}], "risk_score": 5, "risk_level": "low", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "N/A"} }
],
"cab_requests": [
{ "id": "CHG-2026-101", "service": "checkout-service", "summary": "Deploy passkey authentication", "risk_score": 65, "status": "pending", "approvers": ["service-owners", "security-team"] },
{ "id": "CHG-2026-100", "service": "catalog-api", "summary": "Enable vector search ranking", "risk_score": 35, "status": "auto-approved", "approvers": ["auto-cab"] },
{ "id": "CHG-2026-099", "service": "order-service", "summary": "Database schema migration v42", "risk_score": 85, "status": "approved", "approvers": ["service-owners", "platform-team", "dba-team"] },
{ "id": "CHG-2026-098", "service": "notification-svc", "summary": "Update email templates", "risk_score": 15, "status": "auto-approved", "approvers": ["auto-cab"] },
{ "id": "CHG-2026-097", "service": "identity-service", "summary": "Rotate JWT signing keys", "risk_score": 90, "status": "pending", "approvers": ["service-owners", "security-team", "platform-team"] }
]
}
04 Deployment Compliance Gates
Pre-Deployment Compliance Checks Required
Before any deployment to staging or production, the pipeline evaluates a set of compliance gates. All gates must pass — a single failure blocks the deployment.
Deployment Blocking Criteria
- Critical/High vulnerabilities: Defender for DevOps or container scan results with Critical or High CVEs block deployment
- Azure Policy violations: any resource with
denyeffect policy failure - Missing required tags:
cost-center,owner,environment,data-classification - Dependency SLA: if a critical dependency service is in degraded state, block deployment
- Change window violation: no deployments during blackout periods (weekends, holidays, 10pm–6am CST)
- Test coverage: unit test coverage below 80% on changed files
- Missing runbook: Tier-1 services must have an updated runbook
Azure Policy for AKS Required
Azure Policy for Kubernetes enforces guardrails on AKS workloads:
- No privileged containers (
deny) - Resource limits required on all containers (
deny) - Only images from approved ACR (
deny) - Required labels:
app.kubernetes.io/name,acme.com/team(deny) - No host network or host PID (
deny) - ReadOnlyRootFilesystem recommended (
audit)
Defender for Cloud Integration
Defender for Cloud security recommendations must be addressed before promotion to production. The Secure Score must remain above the team's minimum threshold (default: 80/100).
Compliance as Code Recommended
All Azure Policy definitions stored as Terraform in Git. Changes to policies follow the same PR review process as application code. Policy assignments versioned and deployed via pipeline.
Deployment Windows & Blackout Periods
| Window | Schedule | Allowed |
|---|---|---|
| Standard | Mon–Fri 6:00–18:00 CST | All deployments |
| Extended | Mon–Fri 18:00–22:00 CST | Tier-2/3 only, with approval |
| Blackout | Weekends, holidays, 22:00–6:00 | Emergency hotfixes only (P1) |
Post-Deployment Validation
- Smoke tests: automated HTTP health checks against deployed endpoints
- Synthetic monitoring: Application Insights availability tests for critical user flows
- Automatic rollback: if smoke tests fail, pipeline triggers Helm rollback
- Soak period: 30-minute monitoring window before declaring deployment successful
Audit Trail Required
Every deployment produces an immutable audit record in Azure DevOps + Log Analytics containing: who approved, what changed, when deployed, why (linked work item), risk score, and compliance gate results.
Azure Policy: Deny Untagged Resources
# policies/require-tags.tf
resource "azurerm_policy_definition" "require_tags" {
name = "require-mandatory-tags"
policy_type = "Custom"
mode = "Indexed"
display_name = "Require mandatory resource tags"
description = "Denies resource creation without required tags"
policy_rule = jsonencode({
if = {
anyOf = [
{ field = "tags['cost-center']", exists = false },
{ field = "tags['owner']", exists = false },
{ field = "tags['environment']", exists = false },
{ field = "tags['data-classification']", exists = false }
]
}
then = { effect = "deny" }
})
}
resource "azurerm_policy_definition" "aks_no_privileged" {
name = "aks-deny-privileged-containers"
policy_type = "Custom"
mode = "Microsoft.Kubernetes.Data"
display_name = "AKS: Deny privileged containers"
policy_rule = jsonencode({
if = {
field = "type"
equals = "Microsoft.ContainerService/managedClusters"
}
then = {
effect = "deny"
details = {
templateInfo = {
sourceType = "PublicURL"
url = "https://store.policy.core.windows.net/kubernetes/container-no-privilege/v2/template.yaml"
}
constraint = "https://store.policy.core.windows.net/kubernetes/container-no-privilege/v2/constraint.yaml"
}
}
})
}
Azure Policy Initiative: Deployment Readiness
# policies/deployment-readiness-initiative.tf
resource "azurerm_policy_set_definition" "deployment_readiness" {
name = "deployment-readiness-v1"
policy_type = "Custom"
display_name = "Deployment Readiness Initiative"
description = "All policies that must pass before production deployment"
policy_definition_reference {
policy_definition_id = azurerm_policy_definition.require_tags.id
reference_id = "requireTags"
}
policy_definition_reference {
policy_definition_id = azurerm_policy_definition.aks_no_privileged.id
reference_id = "aksNoPrivileged"
}
policy_definition_reference {
policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/febd0533-8e55-448f-b837-bd0e06f16469"
reference_id = "aksResourceLimits"
}
policy_definition_reference {
policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/d2d3ab89-5a4e-4921-81fa-3c2c0a69fc6b"
reference_id = "aksApprovedImages"
}
}
resource "azurerm_subscription_policy_assignment" "readiness" {
name = "deploy-readiness"
subscription_id = data.azurerm_subscription.current.id
policy_definition_id = azurerm_policy_set_definition.deployment_readiness.id
display_name = "Deployment Readiness Checks"
enforce = true
}
Pre-Deployment Compliance Check Pipeline
# pipelines/compliance-gate.yaml
stages:
- stage: ComplianceGate
displayName: 'Pre-Deployment Compliance'
jobs:
- job: PolicyCheck
displayName: 'Azure Policy Compliance'
steps:
- task: AzureCLI@2
displayName: 'Check policy compliance'
inputs:
azureSubscription: 'azure-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
# Get non-compliant resources in target resource group
NON_COMPLIANT=$(az policy state summarize \
--resource-group "rg-$(environment)-$(service)" \
--query "results[0].nonCompliantResources" -o tsv)
if [ "$NON_COMPLIANT" -gt 0 ]; then
echo "##vso[task.logissue type=error]$NON_COMPLIANT non-compliant resources found"
az policy state list \
--resource-group "rg-$(environment)-$(service)" \
--filter "complianceState eq 'NonCompliant'" \
--query "[].{resource:resourceId, policy:policyDefinitionName}" -o table
exit 1
fi
echo "All resources compliant"
- job: VulnerabilityCheck
displayName: 'Defender Vulnerability Check'
steps:
- task: AzureCLI@2
displayName: 'Check Defender findings'
inputs:
azureSubscription: 'azure-service-connection'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
CRITICAL=$(az security sub-assessment list \
--assessed-resource-id "/subscriptions/$(subscriptionId)" \
--query "length([?status.severity=='High' || status.severity=='Critical'])" -o tsv)
if [ "$CRITICAL" -gt 0 ]; then
echo "##vso[task.logissue type=error]$CRITICAL critical/high vulnerabilities found"
exit 1
fi
- job: BlackoutCheck
displayName: 'Deployment Window Check'
steps:
- script: |
HOUR=$(TZ="America/Chicago" date +%H)
DAY=$(TZ="America/Chicago" date +%u)
if [ "$DAY" -gt 5 ] || [ "$HOUR" -lt 6 ] || [ "$HOUR" -ge 22 ]; then
echo "##vso[task.logissue type=error]Deployment blocked: outside deployment window"
echo "Current time: $(TZ='America/Chicago' date)"
echo "Allowed: Mon-Fri 06:00-22:00 CST"
exit 1
fi
echo "Within deployment window"
displayName: 'Check deployment window'
Post-Deployment Smoke Test & Rollback
# pipelines/post-deploy-validation.yaml
- stage: SmokeTest
displayName: 'Post-Deployment Validation'
dependsOn: ProductionDeploy
jobs:
- job: SmokeTests
steps:
- script: |
echo "Running smoke tests against $(service).$(domain)"
for endpoint in /health /ready /api/status; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
"https://$(service).$(domain)${endpoint}")
if [ "$STATUS" != "200" ]; then
echo "##vso[task.logissue type=error]Smoke test failed: ${endpoint} returned $STATUS"
exit 1
fi
echo "✓ ${endpoint}: $STATUS"
done
displayName: 'HTTP smoke tests'
- script: |
echo "Soak period: monitoring for 30 minutes..."
sleep 1800
# Check Application Insights for error spike
ERROR_RATE=$(az monitor app-insights query \
--app "$(appInsightsName)" \
--analytics-query "requests | where timestamp > ago(30m) | summarize errorRate=todouble(countif(success==false))/count()" \
--query "tables[0].rows[0][0]" -o tsv)
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "##vso[task.logissue type=error]Error rate $ERROR_RATE exceeds threshold"
exit 1
fi
displayName: 'Soak period monitoring'
- stage: AutoRollback
displayName: 'Auto-Rollback on Failure'
dependsOn: SmokeTest
condition: failed()
jobs:
- deployment: Rollback
environment: 'aks-production'
strategy:
runOnce:
deploy:
steps:
- task: HelmDeploy@0
displayName: 'Helm rollback'
inputs:
connectionType: 'Kubernetes Service Connection'
kubernetesServiceConnection: 'aks-production'
namespace: '$(team)'
command: 'rollback'
releaseName: '$(service)'
arguments: '--wait --timeout 5m'
- script: |
echo "##vso[task.logissue type=warning]ROLLBACK EXECUTED for $(service)"
# Notify Teams channel
curl -H 'Content-Type: application/json' \
-d '{"text":"⚠️ Auto-rollback executed for $(service) in production"}' \
"$(teamsWebhookUrl)"
displayName: 'Notify rollback'
Log Analytics: Deployment Audit Trail (KQL)
// Deployment Audit Trail — KQL Query
// Run in Log Analytics workspace
DeploymentAudit_CL
| where TimeGenerated > ago(30d)
| project
TimeGenerated,
Service = service_s,
Environment = environment_s,
Version = version_s,
DeployedBy = deployed_by_s,
ApprovedBy = approved_by_s,
RiskScore = risk_score_d,
RiskLevel = risk_level_s,
PipelineRunId = pipeline_run_id_s,
WorkItemId = work_item_id_s,
ComplianceGates = compliance_gates_s,
Duration = duration_s,
Status = status_s
| order by TimeGenerated desc
// Compliance Gate Summary — last 7 days
DeploymentAudit_CL
| where TimeGenerated > ago(7d)
| extend Gates = parse_json(compliance_gates_s)
| mv-expand Gate = Gates
| summarize
PassCount = countif(Gate.status == "passed"),
FailCount = countif(Gate.status == "failed")
by GateName = tostring(Gate.name)
| extend PassRate = round(todouble(PassCount) / (PassCount + FailCount) * 100, 1)
| order by PassRate asc
Azure Monitor: Deployment Health Alert
# modules/monitoring/deployment-alert.tf
resource "azurerm_monitor_scheduled_query_rules_alert_v2" "deploy_health" {
name = "alert-deployment-error-spike-${var.service_name}"
resource_group_name = var.resource_group_name
location = var.location
description = "Error rate spike detected after deployment"
severity = 1
enabled = true
scopes = [var.application_insights_id]
evaluation_frequency = "PT5M"
window_duration = "PT15M"
criteria {
query = <<-KQL
requests
| where timestamp > ago(15m)
| summarize
errorRate = todouble(countif(success == false)) / count(),
totalRequests = count()
| where errorRate > 0.01 and totalRequests > 100
KQL
time_aggregation_method = "Count"
operator = "GreaterThan"
threshold = 0
}
action {
action_groups = [var.action_group_id]
}
tags = var.common_tags
}
Glossary
| Term | Definition |
|---|---|
| Compliance Gate | An automated check that must pass before a deployment is permitted to proceed. |
| Azure Policy | Azure-native governance service that enforces organizational rules on resources at scale. |
| Policy Initiative | A collection of Azure Policy definitions grouped and assigned together as a single unit. |
| Deny Effect | Azure Policy effect that blocks resource creation or update if the rule is violated. |
| Audit Effect | Azure Policy effect that flags non-compliance without blocking the operation. |
| Defender for Cloud | Unified security posture management and threat protection across Azure resources. |
| Secure Score | A Defender for Cloud metric (0–100) representing the security posture of your subscriptions. |
| Deployment Readiness | The state where all pre-deployment gates (tests, scans, policies, approvals) have passed. |
| Blackout Window | A time period during which production deployments are prohibited (weekends, holidays, late night). |
| Smoke Test | Minimal health-check tests run immediately after deployment to validate basic functionality. |
| Synthetic Monitor | Application Insights availability tests that simulate user flows at regular intervals. |
| Rollback Trigger | An automated condition (error rate spike, smoke test failure) that initiates deployment rollback. |
| Change Window | The approved time range during which deployments are permitted. |
| Data Classification Tag | A required resource tag indicating the sensitivity level of data (public, internal, confidential, restricted). |
| Immutable Audit Log | A tamper-proof record of all deployment actions stored in Log Analytics with retention policies. |
| KQL | Kusto Query Language — the query language used in Azure Log Analytics and Application Insights. |
Component: ComplianceGateway
Screen Layout: A vertical checklist for a specific deployment, with each gate showing: gate name, status (pass/fail/pending), details, and timestamp. Gates include: Azure Policy compliance, vulnerability scan, tag validation, deployment window check, test coverage, runbook verification, and approval status. Overall readiness indicator (green checkmark or red block) at the top.
Component: PolicyDashboard
Screen Layout: Azure Policy compliance overview with compliance percentage per resource group. Drill-down table showing individual policy violations with resource name, policy name, effect, and remediation guidance. Filter by subscription, resource group, and policy category.
Component: DeploymentAuditLog
Screen Layout: Searchable, sortable table of all deployments with columns: timestamp, service, environment, version, deployed by, risk score, compliance gates summary, duration, and status. Filter by service, environment, date range, and status. Export to CSV.
Mock Data
{
"services": [
{ "name": "checkout-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 100, "secure_score": 92, "deployment_ready": true },
{ "name": "catalog-api", "policy_compliance": 95, "vuln_count": 2, "tag_compliance": 100, "secure_score": 85, "deployment_ready": false },
{ "name": "order-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 95, "secure_score": 88, "deployment_ready": false },
{ "name": "identity-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 100, "secure_score": 95, "deployment_ready": true },
{ "name": "notification-svc", "policy_compliance": 100, "vuln_count": 1, "tag_compliance": 100, "secure_score": 90, "deployment_ready": true },
{ "name": "payment-processor", "policy_compliance": 98, "vuln_count": 0, "tag_compliance": 100, "secure_score": 91, "deployment_ready": true }
],
"audit_log": [
{ "timestamp": "2026-02-28T14:32:00Z", "service": "checkout-service", "env": "production", "version": "3.4.1", "deployed_by": "jsmith", "risk_score": 35, "gates": "6/6 passed", "duration": "14m", "status": "succeeded" },
{ "timestamp": "2026-02-28T11:15:00Z", "service": "catalog-api", "env": "staging", "version": "2.8.0", "deployed_by": "adoe", "risk_score": 50, "gates": "5/6 passed", "duration": "11m", "status": "failed" },
{ "timestamp": "2026-02-27T16:45:00Z", "service": "order-service", "env": "production", "version": "4.1.2", "deployed_by": "mchen", "risk_score": 25, "gates": "6/6 passed", "duration": "16m", "status": "succeeded" },
{ "timestamp": "2026-02-27T10:20:00Z", "service": "identity-service", "env": "production", "version": "1.9.0", "deployed_by": "kpatel", "risk_score": 70, "gates": "6/6 passed", "duration": "22m", "status": "succeeded" },
{ "timestamp": "2026-02-26T15:00:00Z", "service": "notification-svc", "env": "production", "version": "2.3.5", "deployed_by": "rlee", "risk_score": 15, "gates": "6/6 passed", "duration": "9m", "status": "succeeded" }
]
}
Unified Demo Application Specification
Acme Feature Console (Azure)
Tech Stack: React 18, TypeScript, Tailwind CSS, Recharts, Mock API layer
The unified demo application brings together all four modules into a single dashboard experience for managing the complete feature lifecycle.
Key Screens
- Feature Flag Dashboard — All flags with state, targeting rules, rollout percentages, and TTL tracking
- Canary Monitor — Real-time canary deployment health, traffic weight, and success rate metrics
- Pipeline Dashboard — Pipeline runs, stage status, success rates, and deployment frequency
- PR Workflow Board — Kanban view of PRs with quality gates and review status
- CAB Dashboard — Change requests with risk scores, auto-approval history, and approval queue
- Compliance Gateway — Pre-deployment checklist with gate pass/fail status
- Deployment Audit Log — Searchable history of all deployments with compliance data
Architecture Overview
// Component Tree
App
├── Layout
│ ├── TopBar (theme toggle, user menu)
│ └── Sidebar (navigation)
├── Pages
│ ├── FeatureFlagDashboard
│ │ ├── FlagTable (sortable, filterable)
│ │ ├── FlagDetail (targeting rules, audit log)
│ │ └── CreateFlagModal
│ ├── CanaryMonitor
│ │ ├── CanaryCard (per deployment)
│ │ ├── MetricsChart (Recharts)
│ │ └── RollbackButton
│ ├── PipelineDashboard
│ │ ├── PipelineRunTable
│ │ ├── StageIndicator
│ │ └── DeploymentTimeline
│ ├── PRWorkflowBoard
│ │ ├── KanbanColumn (per status)
│ │ ├── PRCard
│ │ └── QualityGateBadges
│ ├── CABDashboard
│ │ ├── ChangeRequestQueue
│ │ ├── RiskScoreBadge
│ │ └── ApprovalHistory
│ ├── ComplianceGateway
│ │ ├── GateChecklist
│ │ ├── PolicyDashboard
│ │ └── ReadinessIndicator
│ └── AuditLog
│ ├── AuditTable (searchable)
│ └── ExportButton
└── Services
├── apiClient.ts (mock fetch layer)
├── flagService.ts
├── pipelineService.ts
└── complianceService.ts
Data Model
// Core Entities
FeatureFlag: { name, status, rollout_pct, env, targeting, ttl_days_remaining, created, team }
CanaryDeployment: { service, phase, weight_pct, success_rate, p99_latency, started_at }
PipelineRun: { id, pipeline, trigger, branch, stages, duration_min, actor, started_at }
PullRequest: { id, title, author, status, reviewers[], risk_score, quality_gates }
ChangeRequest: { id, service, summary, risk_score, risk_level, status, approvers[] }
ComplianceGate: { name, status, details, timestamp }
PolicyViolation: { resource_id, policy_name, effect, severity, detected_at }
DeploymentAudit: { timestamp, service, env, version, deployed_by, risk_score, gates, status }
ServiceHealth: { name, policy_compliance, vuln_count, secure_score, deployment_ready }
Full API Schema
| Method | Endpoint | Module | Description |
|---|---|---|---|
GET | /api/flags | M1 | List all feature flags with state and targeting |
POST | /api/flags | M1 | Create a new feature flag |
PUT | /api/flags/{name} | M1 | Update flag state or targeting rules |
DELETE | /api/flags/{name} | M1 | Retire and remove a feature flag |
GET | /api/canary/{service} | M1 | Get canary deployment status |
POST | /api/canary/{service}/rollback | M1 | Trigger canary rollback |
GET | /api/pipelines/runs | M2 | List pipeline runs with stage status |
GET | /api/pipelines/runs/{id} | M2 | Get detailed pipeline run info |
GET | /api/pipelines/metrics | M2 | Pipeline success rate, deployment frequency |
POST | /api/pipelines/trigger | M2 | Manually trigger a pipeline run |
GET | /api/prs | M3 | List PRs with review and quality gate status |
GET | /api/prs/{id}/risk | M3 | Get risk assessment for a PR |
GET | /api/cab/queue | M3 | List pending CAB change requests |
POST | /api/cab/{id}/approve | M3 | Approve a change request |
GET | /api/compliance/readiness/{service} | M4 | Get deployment readiness checklist |
GET | /api/compliance/policies | M4 | List policy compliance per resource group |
GET | /api/compliance/violations | M4 | List active policy violations |
GET | /api/audit/deployments | M4 | Query deployment audit trail |
GET | /api/services | All | List all services with health overview |
GET | /api/services/{name}/health | All | Detailed service health and compliance |
Azure → AWS Service Mapping
| Capability | Azure Version | AWS Equivalent |
|---|---|---|
| Feature Flags | Azure App Configuration Feature Manager | AWS AppConfig Feature Flags |
| CI/CD Pipelines | Azure DevOps Pipelines (YAML) | AWS CodePipeline / GitHub Actions |
| Container Registry | Azure Container Registry (ACR) | Amazon ECR |
| Kubernetes | Azure Kubernetes Service (AKS) | Amazon EKS |
| Policy Engine | Azure Policy | AWS Config Rules / SCPs |
| Security Posture | Microsoft Defender for Cloud | AWS Security Hub |
| Secrets Management | Azure Key Vault | AWS Secrets Manager |
| Monitoring / APM | Application Insights + Azure Monitor | CloudWatch + X-Ray |
| Traffic Management | Azure Traffic Manager | Route 53 weighted routing |
| Service Mesh | Istio on AKS / Azure Service Mesh | AWS App Mesh |
| Artifact Repository | Azure Artifacts | AWS CodeArtifact |
| Work Item Tracking | Azure Boards | Jira / AWS CodeCatalyst |
| Code Repository | Azure DevOps Repos | AWS CodeCommit / GitHub |
| Log Analytics | Azure Log Analytics (KQL) | CloudWatch Logs Insights |