Feature Lifecycle Phases

Every feature follows a structured lifecycle from inception to retirement. Each phase maps to specific IDP modules, Azure DevOps pipelines, App Configuration feature flags, and compliance gates. Click a phase below to jump to its explanation.

Author PR & Code Review
Build CI/CD Pipeline
Gate Compliance Check
Flag Controlled Rollout
Retire Cleanup Flags

Lifecycle phases explained

What is done in each phase of the feature lifecycle on Azure.

Author

Code and changes are authored and submitted via pull requests. Branch policies and code review (e.g. in Azure Repos or GitHub) ensure quality and alignment with standards before merging.

Build

Azure DevOps Pipelines (or equivalent) run CI: build, test, and package the application. Artifacts are published; container images may be built and pushed to ACR. Quality and security gates run in the pipeline.

Gate

Pre-deployment compliance checks: Azure Policy, Defender for Cloud findings, and pipeline gates must pass. Deployment windows and approval steps ensure only compliant changes are released.

Flag

Feature flags are used for controlled rollout (e.g. Azure App Configuration Feature Manager). Teams target audiences, set rollout percentages, and can kill-switch or roll back without redeploying.

Retire

Feature flags and related configuration are cleaned up once the feature is fully rolled out or deprecated. Flags are removed from App Configuration and code paths are simplified.

Target Platform

This specification targets Microsoft Azure with Azure DevOps Pipelines for CI/CD, Azure App Configuration Feature Manager for feature flags, AKS for compute, and Azure Policy + Defender for Cloud for compliance gates.

Module Summary

01 Feature Flagging & Controlled Rollouts

Feature Flag Lifecycle

Every feature flag follows a strict lifecycle managed through Azure App Configuration Feature Manager. Flags transition through four phases:

Create Define flag & filters
Target Percentage/user targeting
Rollout Progressive 1→100%
Retire Remove flag debt

Azure App Configuration Feature Manager Required

All feature flags are centrally managed in Azure App Configuration using the Feature Manager capability. This provides a single source of truth for flag state across all environments.

  • Targeting filter: percentage-based rollouts and user/group targeting
  • Time window filter: auto-enable/disable flags within a date range
  • Custom filters: evaluate flag state based on arbitrary application context
  • Feature flags are environment-scoped — separate App Configuration instances per environment
  • Flag changes require PR approval in the configuration-as-code repo

Deployment Strategies on AKS

Canary Deployment (Flagger + Istio)

Use Flagger with Istio service mesh for automated canary analysis. Traffic is gradually shifted from the stable to canary version based on success rate and latency metrics.

  • Canary weight progression: 1% → 5% → 25% → 50% → 100%
  • Automatic rollback if error rate exceeds threshold (default 1%)
  • Canary analysis interval: 60 seconds
  • Metrics: request success rate, p99 latency from Application Insights

Blue-Green Deployment

Run two identical environments (blue/green) behind Azure Traffic Manager or Nginx Ingress canary annotations. Switch traffic atomically after validation.

  • Azure Traffic Manager weighted routing for DNS-level traffic split
  • Nginx Ingress canary-weight annotation for cluster-level routing
  • Instant rollback by reverting traffic weight to 0

Flag Naming Conventions Required

All feature flags follow the pattern: <team>.<service>.<feature>

ExampleDescription
payments.checkout.new-uiNew checkout UI for payments team
catalog.search.vector-rankingVector-based search ranking experiment
orders.fulfillment.batch-processingBatch processing mode for fulfillment
identity.auth.passkey-loginPasskey authentication experiment

Flag Hygiene & Governance

  • TTL policy: every flag must have a maximum lifespan (default 90 days)
  • Stale flag automation: flags exceeding TTL trigger alerts and auto-create cleanup tickets
  • Audit logging: all flag state changes recorded in Azure Monitor
  • Flag debt dashboard: weekly report of active flags by team, age, and status

Emergency Kill Switches Required

Production Incident Protocol

Every feature deployed behind a flag must have a kill switch. Kill switches instantly disable the feature without a deployment. They are toggled via Azure App Configuration REST API or the Azure Portal and take effect within seconds via real-time refresh.

Observability Integration

Correlate feature flag state with application metrics using Application Insights custom dimensions. Every request tagged with active feature flags enables:

  • Error rate comparison: flag-on vs. flag-off cohorts
  • Performance impact: latency delta per flag
  • Business metrics: conversion rate per variant
  • Custom KQL queries in Log Analytics for flag-based analysis

Terraform: Azure App Configuration with Feature Flags

modules/feature-flags/main.tf
# modules/feature-flags/main.tf
resource "azurerm_app_configuration" "flags" {
  name                = "appconf-${var.environment}-${var.service_name}"
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = "standard"

  identity {
    type = "SystemAssigned"
  }

  tags = var.common_tags
}

resource "azurerm_app_configuration_feature" "feature" {
  for_each             = { for f in var.feature_flags : f.name => f }
  configuration_store_id = azurerm_app_configuration.flags.id
  name                   = each.value.name
  label                  = var.environment
  enabled                = each.value.enabled
  description            = each.value.description

  dynamic "targeting_filter" {
    for_each = each.value.targeting != null ? [each.value.targeting] : []
    content {
      default_rollout_percentage = targeting_filter.value.percentage
      dynamic "groups" {
        for_each = targeting_filter.value.groups
        content {
          name               = groups.value.name
          rollout_percentage = groups.value.percentage
        }
      }
    }
  }

  dynamic "timewindow_filter" {
    for_each = each.value.time_window != null ? [each.value.time_window] : []
    content {
      start = timewindow_filter.value.start
      end   = timewindow_filter.value.end
    }
  }
}

# Grant AKS managed identity read access
resource "azurerm_role_assignment" "aks_reader" {
  scope                = azurerm_app_configuration.flags.id
  role_definition_name = "App Configuration Data Reader"
  principal_id         = var.aks_managed_identity_principal_id
}

Azure DevOps Pipeline: Canary Deployment with Flagger

pipelines/canary-deploy.yaml
# pipelines/canary-deploy.yaml
trigger:
  branches:
    include: [main]

pool:
  vmImage: 'ubuntu-latest'

variables:
  - group: acr-credentials
  - name: acrName
    value: 'acmeacr'
  - name: imageRepository
    value: '$(Build.Repository.Name)'
  - name: tag
    value: '$(Build.BuildId)'

stages:
  - stage: Build
    displayName: 'Build & Push Image'
    jobs:
      - job: BuildImage
        steps:
          - task: Docker@2
            displayName: 'Build and push to ACR'
            inputs:
              containerRegistry: 'acr-service-connection'
              repository: '$(imageRepository)'
              command: 'buildAndPush'
              Dockerfile: '**/Dockerfile'
              tags: |
                $(tag)
                latest

  - stage: DeployCanary
    displayName: 'Deploy Canary to AKS'
    dependsOn: Build
    jobs:
      - deployment: CanaryDeploy
        environment: 'aks-production'
        strategy:
          runOnce:
            deploy:
              steps:
                - task: KubernetesManifest@1
                  displayName: 'Update canary image'
                  inputs:
                    action: 'set-image'
                    kubernetesServiceConnection: 'aks-prod'
                    namespace: '$(team)'
                    containers: |
                      $(acrName).azurecr.io/$(imageRepository):$(tag)
                    manifests: |
                      k8s/deployment.yaml

  - stage: ValidateCanary
    displayName: 'Validate Canary Health'
    dependsOn: DeployCanary
    jobs:
      - job: HealthCheck
        steps:
          - script: |
              echo "Waiting for Flagger canary analysis..."
              kubectl -n $(team) wait canary/$(service) \
                --for=condition=promoted --timeout=600s
            displayName: 'Wait for Flagger promotion'
          - script: |
              CANARY_STATUS=$(kubectl -n $(team) get canary/$(service) \
                -o jsonpath='{.status.phase}')
              if [ "$CANARY_STATUS" != "Succeeded" ]; then
                echo "##vso[task.logissue type=error]Canary failed: $CANARY_STATUS"
                exit 1
              fi
            displayName: 'Verify canary status'

Blue-Green Deployment Pipeline

pipelines/blue-green-deploy.yaml
# pipelines/blue-green-deploy.yaml
stages:
  - stage: DeployGreen
    displayName: 'Deploy to Green Slot'
    jobs:
      - deployment: GreenDeploy
        environment: 'aks-production.green'
        strategy:
          runOnce:
            deploy:
              steps:
                - task: HelmDeploy@0
                  displayName: 'Helm upgrade green'
                  inputs:
                    connectionType: 'Kubernetes Service Connection'
                    kubernetesServiceConnection: 'aks-prod'
                    namespace: '$(team)'
                    command: 'upgrade'
                    chartType: 'FilePath'
                    chartPath: 'charts/$(service)'
                    releaseName: '$(service)-green'
                    overrideValues: |
                      image.tag=$(tag)
                      slot=green

  - stage: SmokeTestGreen
    displayName: 'Smoke Test Green'
    dependsOn: DeployGreen
    jobs:
      - job: SmokeTest
        steps:
          - script: |
              curl -sf http://$(service)-green.$(team).svc.cluster.local/health
            displayName: 'Health check green'

  - stage: SwitchTraffic
    displayName: 'Switch Traffic to Green'
    dependsOn: SmokeTestGreen
    jobs:
      - deployment: TrafficSwitch
        environment: 'aks-production'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    kubectl -n $(team) patch ingress $(service) \
                      -p '{"metadata":{"annotations":{
                        "nginx.ingress.kubernetes.io/canary":"true",
                        "nginx.ingress.kubernetes.io/canary-weight":"100"
                      }}}'
                  displayName: 'Route 100% to green'

.NET SDK Integration: Microsoft.FeatureManagement

src/Startup.cs
// src/Startup.cs
using Microsoft.FeatureManagement;
using Microsoft.FeatureManagement.FeatureFilters;

public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        // Connect to Azure App Configuration
        services.AddAzureAppConfiguration();

        // Add feature management with targeting
        services.AddFeatureManagement()
            .AddFeatureFilter<TargetingFilter>()
            .AddFeatureFilter<TimeWindowFilter>()
            .AddFeatureFilter<PercentageFilter>();

        // Register targeting context accessor
        services.AddSingleton<ITargetingContextAccessor,
            HttpContextTargetingContextAccessor>();
    }

    public void Configure(IApplicationBuilder app)
    {
        // Enable dynamic configuration refresh
        app.UseAzureAppConfiguration();
    }
}

// Usage in a controller
[ApiController]
[Route("api/[controller]")]
public class CheckoutController : ControllerBase
{
    private readonly IFeatureManager _featureManager;

    public CheckoutController(IFeatureManager featureManager)
    {
        _featureManager = featureManager;
    }

    [HttpGet]
    public async Task<IActionResult> GetCheckout()
    {
        if (await _featureManager
            .IsEnabledAsync("payments.checkout.new-ui"))
        {
            return Ok(new { version = "v2", ui = "new" });
        }
        return Ok(new { version = "v1", ui = "classic" });
    }
}

Python SDK Integration

src/feature_flags.py
# src/feature_flags.py
from azure.appconfiguration.provider import load
from azure.identity import DefaultAzureCredential
from featuremanagement import FeatureManager

credential = DefaultAzureCredential()

# Load configuration with feature flags
config = load(
    endpoint="https://appconf-prod-checkout.azconfig.io",
    credential=credential,
    feature_flag_enabled=True,
    feature_flag_refresh_enabled=True,
    refresh_interval=30,  # seconds
)

feature_manager = FeatureManager(config)

# Check flag state
async def get_checkout_experience(user_id: str):
    """Return checkout experience based on feature flag."""
    context = {"user_id": user_id, "groups": ["beta-testers"]}

    if feature_manager.is_enabled(
        "payments.checkout.new-ui", context
    ):
        return {"version": "v2", "ui": "new"}
    return {"version": "v1", "ui": "classic"}

Flagger Canary Resource

k8s/canary.yaml
# k8s/canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: checkout-service
  namespace: payments
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: checkout-service
  service:
    port: 8080
    targetPort: 8080
    gateways:
      - istio-system/public-gateway
    hosts:
      - checkout.acme.com
  analysis:
    interval: 60s
    threshold: 5
    maxWeight: 50
    stepWeight: 5
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 60s
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 60s
    webhooks:
      - name: load-test
        type: rollout
        url: http://flagger-loadtester.istio-system/
        metadata:
          cmd: "hey -z 60s -q 10 -c 2 http://checkout-service-canary.payments:8080/"

Azure Traffic Manager: Weighted Routing

modules/traffic-manager/main.tf
# modules/traffic-manager/main.tf
resource "azurerm_traffic_manager_profile" "canary" {
  name                   = "tm-${var.service_name}-canary"
  resource_group_name    = var.resource_group_name
  traffic_routing_method = "Weighted"

  dns_config {
    relative_name = var.service_name
    ttl           = 30
  }

  monitor_config {
    protocol                     = "HTTPS"
    port                         = 443
    path                         = "/health"
    interval_in_seconds          = 10
    timeout_in_seconds           = 5
    tolerated_number_of_failures = 2
  }

  tags = var.common_tags
}

resource "azurerm_traffic_manager_azure_endpoint" "stable" {
  name               = "stable"
  profile_id         = azurerm_traffic_manager_profile.canary.id
  target_resource_id = var.stable_public_ip_id
  weight             = var.stable_weight  # e.g., 95
}

resource "azurerm_traffic_manager_azure_endpoint" "canary" {
  name               = "canary"
  profile_id         = azurerm_traffic_manager_profile.canary.id
  target_resource_id = var.canary_public_ip_id
  weight             = var.canary_weight  # e.g., 5
}

Glossary

TermDefinition
Feature FlagA runtime toggle that controls feature visibility without code deployment. Stored in Azure App Configuration.
Feature FilterA rule that determines whether a flag is enabled for a given context (targeting, time window, custom).
TargetingEvaluating flag state based on user identity, group membership, or percentage rollout.
Canary DeploymentGradually shifting traffic to a new version while monitoring health metrics. Uses Flagger + Istio on AKS.
Blue-Green DeploymentRunning two identical environments and switching traffic atomically between them.
Progressive RolloutIncrementally increasing the percentage of users exposed to a feature: 1% → 5% → 25% → 50% → 100%.
Kill SwitchAn emergency flag disable mechanism that takes effect within seconds via App Configuration refresh.
Flag DebtAccumulated feature flags that have outlived their purpose but remain in code and configuration.
VariantA specific configuration value associated with a feature flag, enabling A/B testing with multiple options.
A/B TestAn experiment comparing two or more variants of a feature using statistical analysis.
Azure App ConfigurationCentralized Azure service for managing application settings and feature flags.
Feature ManagerThe App Configuration capability that provides feature flag evaluation with filters.
FlaggerA Kubernetes operator that automates canary, A/B testing, and blue-green deployments.
IstioAn open-source service mesh providing traffic management, security, and observability for Kubernetes.
Traffic ManagerAzure DNS-based traffic load balancer supporting weighted, priority, and geographic routing.
Application InsightsAzure APM service for monitoring live applications, detecting anomalies, and diagnosing issues.

Component: FeatureFlagDashboard

Screen Layout: A data table listing all feature flags with columns for name, status (active/canary/retired), targeting rules, rollout percentage progress bar, environment scope, TTL remaining, and last modified date. Filter controls for team, status, and environment. Bulk actions for enable/disable/archive. Each row expandable to show targeting details and flag audit history.

Component: CanaryMonitor

Screen Layout: Real-time dashboard showing active canary deployments. Each canary displays a progress gauge (current weight %), request success rate chart, p99 latency sparkline, and Flagger phase indicator (Initializing → Progressing → Promoting → Succeeded/Failed). Error budget burn-down visualization.

API Endpoints

MethodEndpointDescription
GET/api/flagsList all feature flags with state and targeting
POST/api/flagsCreate a new feature flag
PUT/api/flags/{name}Update flag state or targeting rules
DELETE/api/flags/{name}Retire and remove a feature flag
GET/api/canary/{service}Get canary deployment status and metrics
POST/api/canary/{service}/rollbackTrigger immediate canary rollback

Mock Data

mock-data/feature-flags.json
{
  "feature_flags": [
    { "name": "payments.checkout.new-ui", "status": "active", "rollout_pct": 50, "env": "production", "ttl_days_remaining": 45, "created": "2025-12-01" },
    { "name": "catalog.search.vector-ranking", "status": "canary", "rollout_pct": 5, "env": "production", "ttl_days_remaining": 82, "created": "2025-12-15" },
    { "name": "orders.fulfillment.batch-processing", "status": "active", "rollout_pct": 100, "env": "production", "ttl_days_remaining": 12, "created": "2025-10-01" },
    { "name": "identity.auth.passkey-login", "status": "active", "rollout_pct": 25, "env": "staging", "ttl_days_remaining": 67, "created": "2025-12-20" },
    { "name": "payments.refunds.instant-refund", "status": "canary", "rollout_pct": 1, "env": "production", "ttl_days_remaining": 88, "created": "2026-01-05" },
    { "name": "catalog.recommendations.ml-v3", "status": "retired", "rollout_pct": 0, "env": "all", "ttl_days_remaining": 0, "created": "2025-08-01" },
    { "name": "orders.tracking.real-time-map", "status": "active", "rollout_pct": 75, "env": "production", "ttl_days_remaining": 30, "created": "2025-11-15" },
    { "name": "identity.mfa.biometric-prompt", "status": "active", "rollout_pct": 10, "env": "staging", "ttl_days_remaining": 60, "created": "2025-12-28" },
    { "name": "payments.pricing.dynamic-discount", "status": "canary", "rollout_pct": 5, "env": "production", "ttl_days_remaining": 85, "created": "2026-01-10" },
    { "name": "catalog.images.webp-optimization", "status": "retired", "rollout_pct": 0, "env": "all", "ttl_days_remaining": 0, "created": "2025-07-01" }
  ]
}

02 CI/CD Pipeline Automation

Pipeline Architecture Required

All CI/CD pipelines use Azure DevOps multi-stage YAML pipelines stored in the application repository. Pipelines follow a four-stage model:

Build Compile & test
Test Quality gates
Staging Pre-production
Production Live deployment

Branch Strategy

Trunk-based development with short-lived feature branches. Release branches for hotfixes only.

  • main — production-ready trunk, CI triggers on merge
  • feature/* — short-lived branches (< 3 days), PR required for merge
  • release/* — created from main for hotfix releases only
  • No long-lived develop branches — feature flags replace them

Pipeline Triggers & Approvals

  • CI trigger: automatic on PR merge to main
  • Scheduled builds: nightly for integration tests and security scans
  • Environment approvals: staging auto-deploys; production requires manual approval
  • Approval checks: branch protection, required reviews, artifact verification

Artifact Management

Use Azure Artifacts for package feeds (NuGet, npm) and Azure Container Registry (ACR) for Docker images.

  • All artifacts are versioned with build ID and Git SHA
  • ACR geo-replicated to eastus2 and westus2 for redundancy
  • Image vulnerability scanning enabled via Defender for Containers
  • Retention policy: keep last 30 production images, purge untagged after 7 days

Secret Management in Pipelines Required

No Secrets in Pipelines

Never store secrets as pipeline variables. Use variable groups linked to Azure Key Vault to inject secrets at runtime. Service connections use workload identity federation (no client secrets).

Pipeline Optimization

  • Caching: cache NuGet/npm packages, Docker layers between builds
  • Parallel jobs: run unit tests and lint checks in parallel
  • Shared templates: central pipeline template repo to avoid duplication
  • Notifications: Microsoft Teams channel integration for build status

Multi-Stage Pipeline: .NET Microservice Required

pipelines/dotnet-microservice.yaml
# pipelines/dotnet-microservice.yaml
trigger:
  branches:
    include: [main]

pr:
  branches:
    include: [main]

variables:
  - group: keyvault-secrets  # Linked to Azure Key Vault
  - name: buildConfiguration
    value: 'Release'
  - name: acrName
    value: 'acmeacr'

stages:
  - stage: Build
    displayName: 'Build & Unit Test'
    jobs:
      - job: BuildAndTest
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: UseDotNet@2
            inputs:
              version: '8.x'
          - task: Cache@2
            inputs:
              key: 'nuget | "$(Agent.OS)" | **/packages.lock.json'
              restoreKeys: 'nuget | "$(Agent.OS)"'
              path: '$(NUGET_PACKAGES)'
          - script: dotnet restore
            displayName: 'Restore packages'
          - script: dotnet build --configuration $(buildConfiguration) --no-restore
            displayName: 'Build'
          - script: dotnet test --configuration $(buildConfiguration) --no-build --collect:"XPlat Code Coverage" --results-directory $(Agent.TempDirectory)/coverage
            displayName: 'Run unit tests'
          - task: PublishCodeCoverageResults@2
            inputs:
              summaryFileLocation: '$(Agent.TempDirectory)/coverage/**/coverage.cobertura.xml'

  - stage: SecurityScan
    displayName: 'Security & Quality'
    dependsOn: Build
    jobs:
      - job: SonarQube
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: SonarQubePrepare@5
            inputs:
              SonarQube: 'sonarqube-connection'
              scannerMode: 'MSBuild'
              projectKey: '$(Build.Repository.Name)'
          - script: dotnet build --configuration $(buildConfiguration)
            displayName: 'Build for analysis'
          - task: SonarQubeAnalyze@5
          - task: SonarQubePublish@5
            inputs:
              pollingTimeoutSec: '300'

      - job: ContainerScan
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: Docker@2
            displayName: 'Build image'
            inputs:
              command: 'build'
              Dockerfile: '**/Dockerfile'
              tags: 'scan-$(Build.BuildId)'
          - script: |
              trivy image --exit-code 1 --severity HIGH,CRITICAL \
                $(acrName).azurecr.io/$(Build.Repository.Name):scan-$(Build.BuildId)
            displayName: 'Trivy vulnerability scan'

  - stage: PushImage
    displayName: 'Push to ACR'
    dependsOn: SecurityScan
    jobs:
      - job: Push
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: Docker@2
            inputs:
              containerRegistry: 'acr-service-connection'
              repository: '$(Build.Repository.Name)'
              command: 'buildAndPush'
              tags: |
                $(Build.BuildId)
                $(Build.SourceVersion)

  - stage: DeployStaging
    displayName: 'Deploy to Staging'
    dependsOn: PushImage
    jobs:
      - deployment: StagingDeploy
        environment: 'aks-staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - task: HelmDeploy@0
                  inputs:
                    connectionType: 'Kubernetes Service Connection'
                    kubernetesServiceConnection: 'aks-staging'
                    namespace: '$(team)'
                    command: 'upgrade'
                    chartType: 'FilePath'
                    chartPath: 'charts/$(service)'
                    releaseName: '$(service)'
                    overrideValues: 'image.tag=$(Build.BuildId)'
                    valueFile: 'charts/$(service)/values-staging.yaml'

  - stage: DeployProduction
    displayName: 'Deploy to Production'
    dependsOn: DeployStaging
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - deployment: ProductionDeploy
        environment: 'aks-production'  # Has approval gate
        strategy:
          runOnce:
            deploy:
              steps:
                - task: HelmDeploy@0
                  inputs:
                    connectionType: 'Kubernetes Service Connection'
                    kubernetesServiceConnection: 'aks-production'
                    namespace: '$(team)'
                    command: 'upgrade'
                    chartType: 'FilePath'
                    chartPath: 'charts/$(service)'
                    releaseName: '$(service)'
                    overrideValues: 'image.tag=$(Build.BuildId)'
                    valueFile: 'charts/$(service)/values-production.yaml'

Shared Pipeline Template (Extends Pattern)

templates/build-test-deploy.yaml
# templates/build-test-deploy.yaml
# Shared template — consumed via "extends" pattern
parameters:
  - name: service
    type: string
  - name: team
    type: string
  - name: language
    type: string
    values: ['dotnet', 'node', 'python']
  - name: chartPath
    type: string
    default: 'charts/$(service)'

stages:
  - stage: Build
    jobs:
      - job: Build
        steps:
          - template: steps/${{ parameters.language }}-build.yaml
  - stage: Test
    dependsOn: Build
    jobs:
      - job: QualityGate
        steps:
          - template: steps/sonarqube-scan.yaml
          - template: steps/container-scan.yaml
  - stage: DeployStaging
    dependsOn: Test
    jobs:
      - deployment: Staging
        environment: 'aks-staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - template: steps/helm-deploy.yaml
                  parameters:
                    environment: staging
                    service: ${{ parameters.service }}
                    team: ${{ parameters.team }}

# Consumer pipeline (azure-pipelines.yaml in service repo):
# extends:
#   template: templates/build-test-deploy.yaml@pipeline-templates
#   parameters:
#     service: checkout-service
#     team: payments
#     language: dotnet

Terraform Pipeline: Plan on PR, Apply on Merge

pipelines/terraform.yaml
# pipelines/terraform.yaml
trigger:
  branches:
    include: [main]
  paths:
    include: ['infra/**']

pr:
  branches:
    include: [main]
  paths:
    include: ['infra/**']

stages:
  - stage: Plan
    displayName: 'Terraform Plan'
    jobs:
      - job: TerraformPlan
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: TerraformInstaller@1
            inputs:
              terraformVersion: '1.7.x'
          - task: TerraformCLI@1
            displayName: 'Init'
            inputs:
              command: 'init'
              workingDirectory: 'infra/$(environment)'
              backendType: 'azurerm'
              backendServiceArm: 'terraform-sp'
              backendAzureRmResourceGroupName: 'rg-terraform-state'
              backendAzureRmStorageAccountName: 'stterraformstate'
              backendAzureRmContainerName: 'tfstate'
              backendAzureRmKey: '$(environment)/terraform.tfstate'
          - task: TerraformCLI@1
            displayName: 'Plan'
            inputs:
              command: 'plan'
              workingDirectory: 'infra/$(environment)'
              environmentServiceName: 'terraform-sp'
              commandOptions: '-out=tfplan'
          - task: TerraformCLI@1
            displayName: 'Show Plan'
            inputs:
              command: 'show'
              workingDirectory: 'infra/$(environment)'
              inputTargetPlanOrStateFilePath: 'infra/$(environment)/tfplan'

  - stage: Apply
    displayName: 'Terraform Apply'
    dependsOn: Plan
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - deployment: TerraformApply
        environment: 'infra-$(environment)'  # Approval gate
        strategy:
          runOnce:
            deploy:
              steps:
                - task: TerraformCLI@1
                  displayName: 'Apply'
                  inputs:
                    command: 'apply'
                    workingDirectory: 'infra/$(environment)'
                    environmentServiceName: 'terraform-sp'
                    commandOptions: 'tfplan'

Variable Group Linked to Key Vault

pipelines/variable-group.yaml
# In Azure DevOps UI or REST API:
# Variable group "keyvault-secrets" linked to Key Vault "kv-prod-checkout"
# Secrets auto-mapped: db-connection-string, api-key, jwt-secret

# Pipeline usage:
variables:
  - group: keyvault-secrets

steps:
  - script: |
      # Secrets are available as environment variables
      echo "Connection verified"
      dotnet run --urls "http://+:8080"
    env:
      ConnectionStrings__Default: $(db-connection-string)
      ApiKey: $(api-key)
      Jwt__Secret: $(jwt-secret)
    displayName: 'Run with Key Vault secrets'

Node.js Microservice Pipeline

pipelines/node-microservice.yaml
# pipelines/node-microservice.yaml
stages:
  - stage: Build
    jobs:
      - job: BuildAndTest
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: '20.x'
          - task: Cache@2
            inputs:
              key: 'npm | "$(Agent.OS)" | package-lock.json'
              path: '$(npm_config_cache)'
          - script: npm ci
            displayName: 'Install dependencies'
          - script: npm run lint
            displayName: 'Lint'
          - script: npm run test:coverage
            displayName: 'Run tests with coverage'
          - task: PublishTestResults@2
            inputs:
              testResultsFormat: 'JUnit'
              testResultsFiles: 'junit.xml'
          - script: npm run build
            displayName: 'Build'
          - task: Docker@2
            displayName: 'Build & push to ACR'
            inputs:
              containerRegistry: 'acr-service-connection'
              repository: '$(Build.Repository.Name)'
              command: 'buildAndPush'
              tags: '$(Build.BuildId)'

Glossary

TermDefinition
CI/CDContinuous Integration / Continuous Delivery — automated build, test, and deployment pipeline.
Pipeline StageA logical boundary in a pipeline (Build, Test, Deploy) that can have its own agent pool and conditions.
JobA unit of work within a stage that runs on a single agent.
StepThe smallest unit of execution — a script or task within a job.
TaskA pre-built Azure DevOps building block (e.g., Docker@2, HelmDeploy@0).
EnvironmentAn Azure DevOps resource representing a deployment target with approval gates.
Approval GateA manual or automated check that must pass before a pipeline stage can execute.
Variable GroupA named set of variables that can be linked to Azure Key Vault for secret injection.
Service ConnectionAn Azure DevOps resource providing authenticated access to external services (Azure, ACR, K8s).
Workload Identity FederationSecretless authentication using federated credentials — no client secret or certificate stored.
Pipeline TemplateReusable YAML defining stages, jobs, or steps that can be consumed via extends or template references.
Extends TemplateA pipeline pattern where the consumer pipeline inherits stages from a shared template.
Azure ArtifactsPackage management service for NuGet, npm, Maven, Python, and Universal Packages.
ACR TaskAzure Container Registry's built-in build engine for building images without a local Docker daemon.
Helm ReleaseA deployed instance of a Helm chart on a Kubernetes cluster.

Component: PipelineDashboard

Screen Layout: A table of recent pipeline runs with columns for pipeline name, trigger (CI/manual/scheduled), branch, stages with colored status indicators (green=succeeded, yellow=running, red=failed, gray=skipped), duration, and actor. Click a row to see stage details with individual step logs. Filter by team, status, and date range. Summary cards at top showing: total runs today, success rate, average duration, and deployments count.

Component: DeploymentTimeline

Screen Layout: Horizontal swimlane timeline with rows per environment (Dev, Staging, Production). Each deployment is a colored marker on the timeline showing service name, version, and timestamp. Hover for deployment details. Vertical lines connect promotions across environments.

Mock Data

mock-data/pipeline-runs.json
{
  "pipeline_runs": [
    { "id": 1042, "pipeline": "checkout-service", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 14, "actor": "jsmith" },
    { "id": 1041, "pipeline": "catalog-api", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "pending"}, "duration_min": 11, "actor": "adoe" },
    { "id": 1040, "pipeline": "order-service", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "failed", "staging": "skipped", "production": "skipped"}, "duration_min": 6, "actor": "mchen" },
    { "id": 1039, "pipeline": "identity-service", "trigger": "manual", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 18, "actor": "kpatel" },
    { "id": 1038, "pipeline": "infra-terraform", "trigger": "ci", "branch": "main", "stages": {"plan": "succeeded", "apply": "succeeded"}, "duration_min": 8, "actor": "platform-bot" },
    { "id": 1037, "pipeline": "checkout-service", "trigger": "ci", "branch": "feature/new-cart", "stages": {"build": "succeeded", "test": "succeeded", "staging": "skipped", "production": "skipped"}, "duration_min": 9, "actor": "jsmith" },
    { "id": 1036, "pipeline": "payment-processor", "trigger": "scheduled", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "running"}, "duration_min": 12, "actor": "scheduler" },
    { "id": 1035, "pipeline": "notification-svc", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 10, "actor": "rlee" }
  ],
  "summary": { "runs_today": 24, "success_rate": 87.5, "avg_duration_min": 12.3, "deployments_today": 6 }
}

03 PR Approval Workflows & Codified CAB

Branch Policy Configuration Required

All repositories enforce branch policies on main:

  • Minimum reviewers: 2 approvals required (1 for documentation-only changes)
  • Required reviewers by path: auto-assigned based on file path patterns (CODEOWNERS equivalent)
  • Build validation: CI pipeline must pass before merge is allowed
  • Comment resolution: all PR comments must be resolved
  • Work item linking: every PR linked to an Azure Boards work item

Status Checks

  • SonarQube quality gate: must pass (no new bugs, 80% coverage on new code)
  • Defender for DevOps: security scan for secrets, vulnerabilities, and IaC misconfigurations
  • License compliance: third-party license audit (no GPL in proprietary code)
  • Build validation pipeline: compile + unit tests must succeed

Merge Strategies

Branch TypeStrategyRationale
feature/*mainSquash mergeClean linear history, one commit per feature
release/*mainMerge commitPreserve release branch history for audit
hotfix/*mainMerge commitPreserve hotfix context

Code Review SLA

PriorityResponse TimeApproval Time
P1 — Critical< 1 hour< 4 hours
P2 — High< 4 hours< 8 hours
P3 — Standard< 8 hours< 24 hours

Codified CAB (Change Advisory Board) Required

The CAB process is automated through Azure DevOps pipeline environment approvals, replacing manual meetings for most changes.

Risk Assessment

Every PR is automatically scored for change risk based on:

  • Files changed: infrastructure/security files score higher
  • Blast radius: number of downstream services affected
  • Service tier: Tier-1 (customer-facing) changes score higher than Tier-3 (internal tooling)
  • Change size: large diffs score higher

Auto-Approval Rules

Risk LevelScoreApproval
Low0–30Auto-approved (documentation, config, non-Tier-1)
Medium31–70Single approver from service owner group
High71–100Multi-approver: service owner + platform team + security

CAB Automation

  • Auto-generate change summary from PR title, description, and file diff stats
  • Integration with ServiceNow via Azure DevOps Service Hooks for change record creation
  • Post-approval: deployment pipeline triggered automatically
  • Audit trail: all CAB decisions logged with risk score, approvers, and timestamps

Quality Gates

  • Unit test coverage: ≥80% on new code, ≥60% overall
  • Integration test coverage: ≥60% for service boundary tests
  • Complexity limits: cyclomatic complexity < 15 per method
  • Duplicated lines: < 3% duplication in new code

PR Template

.azuredevops/pull_request_template.md
## Summary


## Type of Change
- [ ] Feature (new functionality)
- [ ] Bug fix
- [ ] Infrastructure / Config
- [ ] Documentation
- [ ] Refactor (no behavior change)

## Checklist
- [ ] Unit tests added/updated (coverage ≥80%)
- [ ] Integration tests added (if applicable)
- [ ] Documentation updated
- [ ] Feature flag created (if new feature)
- [ ] Runbook updated (if operational change)
- [ ] Security review (if auth/data changes)
- [ ] Database migration tested (if schema change)

## Risk Assessment
- **Blast radius:** [Low / Medium / High]
- **Service tier:** [Tier-1 / Tier-2 / Tier-3]
- **Rollback plan:** [Describe rollback strategy]

## Related Work Items
AB#

Change Risk Assessment Script

scripts/risk-assessment.py
#!/usr/bin/env python3
"""Automated change risk assessment for Azure DevOps PRs."""
import json
import os
import sys
from dataclasses import dataclass

HIGH_RISK_PATHS = [
    "infra/", "terraform/", "k8s/", "charts/",
    "Dockerfile", "docker-compose",
    ".github/", ".azuredevops/", "pipelines/",
    "migrations/", "security/", "auth/",
]

TIER_1_SERVICES = [
    "checkout-service", "payment-processor",
    "identity-service", "order-service",
]

@dataclass
class RiskScore:
    total: int
    level: str  # low, medium, high
    factors: list

def assess_risk(
    changed_files: list[str],
    lines_changed: int,
    service_name: str,
    downstream_count: int,
) -> RiskScore:
    score = 0
    factors = []

    # File path risk
    infra_files = [f for f in changed_files
                   if any(f.startswith(p) for p in HIGH_RISK_PATHS)]
    if infra_files:
        score += 30
        factors.append(f"Infrastructure files: {len(infra_files)}")

    # Blast radius
    if downstream_count > 5:
        score += 25
        factors.append(f"High blast radius: {downstream_count} services")
    elif downstream_count > 2:
        score += 15
        factors.append(f"Medium blast radius: {downstream_count} services")

    # Service tier
    if service_name in TIER_1_SERVICES:
        score += 20
        factors.append(f"Tier-1 service: {service_name}")

    # Change size
    if lines_changed > 500:
        score += 15
        factors.append(f"Large change: {lines_changed} lines")
    elif lines_changed > 200:
        score += 10
        factors.append(f"Medium change: {lines_changed} lines")

    # Database migrations
    migration_files = [f for f in changed_files if "migration" in f.lower()]
    if migration_files:
        score += 20
        factors.append(f"Database migration: {len(migration_files)} files")

    level = "low" if score <= 30 else "medium" if score <= 70 else "high"
    return RiskScore(total=min(score, 100), level=level, factors=factors)

if __name__ == "__main__":
    result = assess_risk(
        changed_files=json.loads(os.environ.get("CHANGED_FILES", "[]")),
        lines_changed=int(os.environ.get("LINES_CHANGED", "0")),
        service_name=os.environ.get("SERVICE_NAME", ""),
        downstream_count=int(os.environ.get("DOWNSTREAM_COUNT", "0")),
    )
    print(json.dumps({
        "score": result.total,
        "level": result.level,
        "factors": result.factors,
    }, indent=2))
    # Set Azure DevOps variable for pipeline gates
    print(f"##vso[task.setvariable variable=riskLevel]{result.level}")
    print(f"##vso[task.setvariable variable=riskScore]{result.total}")

Build Validation Pipeline

pipelines/pr-validation.yaml
# pipelines/pr-validation.yaml
# Triggered as a build validation policy on the main branch
trigger: none
pr:
  branches:
    include: [main]

pool:
  vmImage: 'ubuntu-latest'

steps:
  - script: dotnet restore && dotnet build --configuration Release
    displayName: 'Build'

  - script: dotnet test --no-build --collect:"XPlat Code Coverage"
    displayName: 'Unit tests'

  - task: SonarQubePrepare@5
    inputs:
      SonarQube: 'sonarqube-connection'
      scannerMode: 'MSBuild'
      projectKey: '$(Build.Repository.Name)'
  - task: SonarQubeAnalyze@5
  - task: SonarQubePublish@5

  # Risk assessment
  - script: |
      CHANGED=$(git diff --name-only origin/main...HEAD)
      LINES=$(git diff --stat origin/main...HEAD | tail -1 | awk '{print $4+$6}')
      export CHANGED_FILES=$(echo "$CHANGED" | jq -R . | jq -s .)
      export LINES_CHANGED=$LINES
      export SERVICE_NAME=$(basename $(pwd))
      python scripts/risk-assessment.py
    displayName: 'Risk assessment'

  - script: |
      echo "Risk Level: $(riskLevel)"
      echo "Risk Score: $(riskScore)"
      echo "##vso[task.addattachment type=risk-assessment;name=risk]$(riskLevel):$(riskScore)"
    displayName: 'Report risk score'

ServiceNow Integration via Service Hook

hooks/servicenow-change-record.json
{
  "publisherId": "tfs",
  "eventType": "ms.vss-release.deployment-approval-pending-event",
  "consumerActionId": "httpRequest",
  "consumerInputs": {
    "url": "https://acme.service-now.com/api/sn_chg_rest/change",
    "httpHeaders": "Content-Type: application/json\nAuthorization: Bearer {{token}}",
    "resourceDetailsToSend": "all",
    "messagesToSend": "all",
    "detailedMessagesToSend": "all"
  },
  "publisherInputs": {
    "releaseEnvironmentId": "production",
    "releaseDefinitionId": "",
    "projectId": "{{projectId}}"
  }
}

Codified CAB: Environment with Approval Gates

pipelines/cab-gate.yaml
# Deployment stage with codified CAB approval
- stage: ProductionDeploy
  displayName: 'Production Deployment (CAB Gated)'
  dependsOn: StagingValidation
  jobs:
    - deployment: CabApproval
      # Environment 'aks-production' configured with:
      # - Approval: service-owners group + platform-team
      # - Branch control: only from refs/heads/main
      # - Business hours: Mon-Fri 06:00-18:00 CST
      # - Exclusive lock: one deployment at a time
      environment: 'aks-production'
      strategy:
        runOnce:
          deploy:
            steps:
              - script: |
                  if [ "$(riskLevel)" = "low" ]; then
                    echo "Low-risk change — auto-approved by codified CAB"
                  fi
                displayName: 'CAB decision log'
              - task: HelmDeploy@0
                inputs:
                  connectionType: 'Kubernetes Service Connection'
                  kubernetesServiceConnection: 'aks-production'
                  namespace: '$(team)'
                  command: 'upgrade'
                  chartType: 'FilePath'
                  chartPath: 'charts/$(service)'
                  releaseName: '$(service)'
                  overrideValues: 'image.tag=$(Build.BuildId)'

Glossary

TermDefinition
Branch PolicyAzure DevOps Repos configuration enforcing rules on branch operations (merge requirements, build validation).
Build ValidationA required pipeline run triggered on every PR that must succeed before merge is permitted.
Required ReviewerAn auto-assigned reviewer based on file path patterns, ensuring domain experts review relevant changes.
Status CheckAn external validation (SonarQube, security scan) that reports pass/fail to the PR.
Squash MergeCombining all commits in a feature branch into a single commit on the target branch.
PR TemplateA markdown template pre-populating PR descriptions with checklists and required fields.
CODEOWNERSFile path–based automatic reviewer assignment (Azure DevOps uses "Required Reviewer" policies).
Codified CABAn automated Change Advisory Board that uses pipeline gates and risk scores instead of manual meetings.
Change Risk AssessmentAutomated scoring of a change's risk level based on file paths, blast radius, and service tier.
Blast RadiusThe number of downstream services or users affected if the change introduces a defect.
Change RecordA formal record in ServiceNow/ITSM documenting the change, its risk, and approval history.
Service HookAzure DevOps webhook that sends event notifications to external services (ServiceNow, Slack).
Quality GateA set of measurable thresholds (coverage, complexity) that code must meet to proceed.
Technical Debt ScoreA SonarQube metric estimating the effort required to fix all code maintainability issues.

Component: PRWorkflowBoard

Screen Layout: Kanban-style board with columns: Draft → In Review → Changes Requested → Approved → Merged. Each PR card shows title, author avatar, reviewer status (pending/approved/rejected), quality gate results (SonarQube, security scan), and age indicator. Cards have colored borders based on risk level (green=low, yellow=medium, red=high).

Component: CABDashboard

Screen Layout: Queue of pending change requests with columns: service, change summary, risk score (color-coded), requested by, approval status, and scheduled deployment window. History section showing past approvals with auto-approve rate percentage. Pie chart breakdown: auto-approved vs. manual approval vs. rejected.

Mock Data

mock-data/pr-workflow.json
{
  "pull_requests": [
    { "id": 1234, "title": "Add passkey authentication flow", "author": "jsmith", "status": "in_review", "reviewers": [{"name": "adoe", "status": "approved"}, {"name": "mchen", "status": "pending"}], "risk_score": 65, "risk_level": "medium", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "82%"} },
    { "id": 1235, "title": "Update Helm chart resource limits", "author": "kpatel", "status": "approved", "reviewers": [{"name": "platform-team", "status": "approved"}], "risk_score": 45, "risk_level": "medium", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "N/A"} },
    { "id": 1236, "title": "Fix checkout total calculation", "author": "adoe", "status": "changes_requested", "reviewers": [{"name": "jsmith", "status": "changes_requested"}], "risk_score": 55, "risk_level": "medium", "quality_gates": {"sonarqube": "failed", "security": "passed", "coverage": "75%"} },
    { "id": 1237, "title": "Update README badges", "author": "rlee", "status": "merged", "reviewers": [{"name": "auto-approve", "status": "approved"}], "risk_score": 5, "risk_level": "low", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "N/A"} }
  ],
  "cab_requests": [
    { "id": "CHG-2026-101", "service": "checkout-service", "summary": "Deploy passkey authentication", "risk_score": 65, "status": "pending", "approvers": ["service-owners", "security-team"] },
    { "id": "CHG-2026-100", "service": "catalog-api", "summary": "Enable vector search ranking", "risk_score": 35, "status": "auto-approved", "approvers": ["auto-cab"] },
    { "id": "CHG-2026-099", "service": "order-service", "summary": "Database schema migration v42", "risk_score": 85, "status": "approved", "approvers": ["service-owners", "platform-team", "dba-team"] },
    { "id": "CHG-2026-098", "service": "notification-svc", "summary": "Update email templates", "risk_score": 15, "status": "auto-approved", "approvers": ["auto-cab"] },
    { "id": "CHG-2026-097", "service": "identity-service", "summary": "Rotate JWT signing keys", "risk_score": 90, "status": "pending", "approvers": ["service-owners", "security-team", "platform-team"] }
  ]
}

04 Deployment Compliance Gates

Pre-Deployment Compliance Checks Required

Before any deployment to staging or production, the pipeline evaluates a set of compliance gates. All gates must pass — a single failure blocks the deployment.

Deployment Blocking Criteria

  • Critical/High vulnerabilities: Defender for DevOps or container scan results with Critical or High CVEs block deployment
  • Azure Policy violations: any resource with deny effect policy failure
  • Missing required tags: cost-center, owner, environment, data-classification
  • Dependency SLA: if a critical dependency service is in degraded state, block deployment
  • Change window violation: no deployments during blackout periods (weekends, holidays, 10pm–6am CST)
  • Test coverage: unit test coverage below 80% on changed files
  • Missing runbook: Tier-1 services must have an updated runbook

Azure Policy for AKS Required

Azure Policy for Kubernetes enforces guardrails on AKS workloads:

  • No privileged containers (deny)
  • Resource limits required on all containers (deny)
  • Only images from approved ACR (deny)
  • Required labels: app.kubernetes.io/name, acme.com/team (deny)
  • No host network or host PID (deny)
  • ReadOnlyRootFilesystem recommended (audit)

Defender for Cloud Integration

Defender for Cloud security recommendations must be addressed before promotion to production. The Secure Score must remain above the team's minimum threshold (default: 80/100).

Compliance as Code Recommended

All Azure Policy definitions stored as Terraform in Git. Changes to policies follow the same PR review process as application code. Policy assignments versioned and deployed via pipeline.

Deployment Windows & Blackout Periods

WindowScheduleAllowed
StandardMon–Fri 6:00–18:00 CSTAll deployments
ExtendedMon–Fri 18:00–22:00 CSTTier-2/3 only, with approval
BlackoutWeekends, holidays, 22:00–6:00Emergency hotfixes only (P1)

Post-Deployment Validation

  • Smoke tests: automated HTTP health checks against deployed endpoints
  • Synthetic monitoring: Application Insights availability tests for critical user flows
  • Automatic rollback: if smoke tests fail, pipeline triggers Helm rollback
  • Soak period: 30-minute monitoring window before declaring deployment successful

Audit Trail Required

Every deployment produces an immutable audit record in Azure DevOps + Log Analytics containing: who approved, what changed, when deployed, why (linked work item), risk score, and compliance gate results.

Azure Policy: Deny Untagged Resources

policies/require-tags.tf
# policies/require-tags.tf
resource "azurerm_policy_definition" "require_tags" {
  name         = "require-mandatory-tags"
  policy_type  = "Custom"
  mode         = "Indexed"
  display_name = "Require mandatory resource tags"
  description  = "Denies resource creation without required tags"

  policy_rule = jsonencode({
    if = {
      anyOf = [
        { field = "tags['cost-center']", exists = false },
        { field = "tags['owner']", exists = false },
        { field = "tags['environment']", exists = false },
        { field = "tags['data-classification']", exists = false }
      ]
    }
    then = { effect = "deny" }
  })
}

resource "azurerm_policy_definition" "aks_no_privileged" {
  name         = "aks-deny-privileged-containers"
  policy_type  = "Custom"
  mode         = "Microsoft.Kubernetes.Data"
  display_name = "AKS: Deny privileged containers"

  policy_rule = jsonencode({
    if = {
      field  = "type"
      equals = "Microsoft.ContainerService/managedClusters"
    }
    then = {
      effect = "deny"
      details = {
        templateInfo = {
          sourceType = "PublicURL"
          url        = "https://store.policy.core.windows.net/kubernetes/container-no-privilege/v2/template.yaml"
        }
        constraint = "https://store.policy.core.windows.net/kubernetes/container-no-privilege/v2/constraint.yaml"
      }
    }
  })
}

Azure Policy Initiative: Deployment Readiness

policies/deployment-readiness-initiative.tf
# policies/deployment-readiness-initiative.tf
resource "azurerm_policy_set_definition" "deployment_readiness" {
  name         = "deployment-readiness-v1"
  policy_type  = "Custom"
  display_name = "Deployment Readiness Initiative"
  description  = "All policies that must pass before production deployment"

  policy_definition_reference {
    policy_definition_id = azurerm_policy_definition.require_tags.id
    reference_id         = "requireTags"
  }

  policy_definition_reference {
    policy_definition_id = azurerm_policy_definition.aks_no_privileged.id
    reference_id         = "aksNoPrivileged"
  }

  policy_definition_reference {
    policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/febd0533-8e55-448f-b837-bd0e06f16469"
    reference_id         = "aksResourceLimits"
  }

  policy_definition_reference {
    policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/d2d3ab89-5a4e-4921-81fa-3c2c0a69fc6b"
    reference_id         = "aksApprovedImages"
  }
}

resource "azurerm_subscription_policy_assignment" "readiness" {
  name                 = "deploy-readiness"
  subscription_id      = data.azurerm_subscription.current.id
  policy_definition_id = azurerm_policy_set_definition.deployment_readiness.id
  display_name         = "Deployment Readiness Checks"
  enforce              = true
}

Pre-Deployment Compliance Check Pipeline

pipelines/compliance-gate.yaml
# pipelines/compliance-gate.yaml
stages:
  - stage: ComplianceGate
    displayName: 'Pre-Deployment Compliance'
    jobs:
      - job: PolicyCheck
        displayName: 'Azure Policy Compliance'
        steps:
          - task: AzureCLI@2
            displayName: 'Check policy compliance'
            inputs:
              azureSubscription: 'azure-service-connection'
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              inlineScript: |
                # Get non-compliant resources in target resource group
                NON_COMPLIANT=$(az policy state summarize \
                  --resource-group "rg-$(environment)-$(service)" \
                  --query "results[0].nonCompliantResources" -o tsv)
                if [ "$NON_COMPLIANT" -gt 0 ]; then
                  echo "##vso[task.logissue type=error]$NON_COMPLIANT non-compliant resources found"
                  az policy state list \
                    --resource-group "rg-$(environment)-$(service)" \
                    --filter "complianceState eq 'NonCompliant'" \
                    --query "[].{resource:resourceId, policy:policyDefinitionName}" -o table
                  exit 1
                fi
                echo "All resources compliant"

      - job: VulnerabilityCheck
        displayName: 'Defender Vulnerability Check'
        steps:
          - task: AzureCLI@2
            displayName: 'Check Defender findings'
            inputs:
              azureSubscription: 'azure-service-connection'
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              inlineScript: |
                CRITICAL=$(az security sub-assessment list \
                  --assessed-resource-id "/subscriptions/$(subscriptionId)" \
                  --query "length([?status.severity=='High' || status.severity=='Critical'])" -o tsv)
                if [ "$CRITICAL" -gt 0 ]; then
                  echo "##vso[task.logissue type=error]$CRITICAL critical/high vulnerabilities found"
                  exit 1
                fi

      - job: BlackoutCheck
        displayName: 'Deployment Window Check'
        steps:
          - script: |
              HOUR=$(TZ="America/Chicago" date +%H)
              DAY=$(TZ="America/Chicago" date +%u)
              if [ "$DAY" -gt 5 ] || [ "$HOUR" -lt 6 ] || [ "$HOUR" -ge 22 ]; then
                echo "##vso[task.logissue type=error]Deployment blocked: outside deployment window"
                echo "Current time: $(TZ='America/Chicago' date)"
                echo "Allowed: Mon-Fri 06:00-22:00 CST"
                exit 1
              fi
              echo "Within deployment window"
            displayName: 'Check deployment window'

Post-Deployment Smoke Test & Rollback

pipelines/post-deploy-validation.yaml
# pipelines/post-deploy-validation.yaml
- stage: SmokeTest
  displayName: 'Post-Deployment Validation'
  dependsOn: ProductionDeploy
  jobs:
    - job: SmokeTests
      steps:
        - script: |
            echo "Running smoke tests against $(service).$(domain)"
            for endpoint in /health /ready /api/status; do
              STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
                "https://$(service).$(domain)${endpoint}")
              if [ "$STATUS" != "200" ]; then
                echo "##vso[task.logissue type=error]Smoke test failed: ${endpoint} returned $STATUS"
                exit 1
              fi
              echo "✓ ${endpoint}: $STATUS"
            done
          displayName: 'HTTP smoke tests'

        - script: |
            echo "Soak period: monitoring for 30 minutes..."
            sleep 1800
            # Check Application Insights for error spike
            ERROR_RATE=$(az monitor app-insights query \
              --app "$(appInsightsName)" \
              --analytics-query "requests | where timestamp > ago(30m) | summarize errorRate=todouble(countif(success==false))/count()" \
              --query "tables[0].rows[0][0]" -o tsv)
            if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
              echo "##vso[task.logissue type=error]Error rate $ERROR_RATE exceeds threshold"
              exit 1
            fi
          displayName: 'Soak period monitoring'

- stage: AutoRollback
  displayName: 'Auto-Rollback on Failure'
  dependsOn: SmokeTest
  condition: failed()
  jobs:
    - deployment: Rollback
      environment: 'aks-production'
      strategy:
        runOnce:
          deploy:
            steps:
              - task: HelmDeploy@0
                displayName: 'Helm rollback'
                inputs:
                  connectionType: 'Kubernetes Service Connection'
                  kubernetesServiceConnection: 'aks-production'
                  namespace: '$(team)'
                  command: 'rollback'
                  releaseName: '$(service)'
                  arguments: '--wait --timeout 5m'
              - script: |
                  echo "##vso[task.logissue type=warning]ROLLBACK EXECUTED for $(service)"
                  # Notify Teams channel
                  curl -H 'Content-Type: application/json' \
                    -d '{"text":"⚠️ Auto-rollback executed for $(service) in production"}' \
                    "$(teamsWebhookUrl)"
                displayName: 'Notify rollback'

Log Analytics: Deployment Audit Trail (KQL)

queries/deployment-audit.kql
// Deployment Audit Trail — KQL Query
// Run in Log Analytics workspace
DeploymentAudit_CL
| where TimeGenerated > ago(30d)
| project
    TimeGenerated,
    Service = service_s,
    Environment = environment_s,
    Version = version_s,
    DeployedBy = deployed_by_s,
    ApprovedBy = approved_by_s,
    RiskScore = risk_score_d,
    RiskLevel = risk_level_s,
    PipelineRunId = pipeline_run_id_s,
    WorkItemId = work_item_id_s,
    ComplianceGates = compliance_gates_s,
    Duration = duration_s,
    Status = status_s
| order by TimeGenerated desc

// Compliance Gate Summary — last 7 days
DeploymentAudit_CL
| where TimeGenerated > ago(7d)
| extend Gates = parse_json(compliance_gates_s)
| mv-expand Gate = Gates
| summarize
    PassCount = countif(Gate.status == "passed"),
    FailCount = countif(Gate.status == "failed")
    by GateName = tostring(Gate.name)
| extend PassRate = round(todouble(PassCount) / (PassCount + FailCount) * 100, 1)
| order by PassRate asc

Azure Monitor: Deployment Health Alert

modules/monitoring/deployment-alert.tf
# modules/monitoring/deployment-alert.tf
resource "azurerm_monitor_scheduled_query_rules_alert_v2" "deploy_health" {
  name                = "alert-deployment-error-spike-${var.service_name}"
  resource_group_name = var.resource_group_name
  location            = var.location
  description         = "Error rate spike detected after deployment"
  severity            = 1
  enabled             = true

  scopes              = [var.application_insights_id]
  evaluation_frequency = "PT5M"
  window_duration      = "PT15M"

  criteria {
    query = <<-KQL
      requests
      | where timestamp > ago(15m)
      | summarize
          errorRate = todouble(countif(success == false)) / count(),
          totalRequests = count()
      | where errorRate > 0.01 and totalRequests > 100
    KQL
    time_aggregation_method = "Count"
    operator                = "GreaterThan"
    threshold               = 0
  }

  action {
    action_groups = [var.action_group_id]
  }

  tags = var.common_tags
}

Glossary

TermDefinition
Compliance GateAn automated check that must pass before a deployment is permitted to proceed.
Azure PolicyAzure-native governance service that enforces organizational rules on resources at scale.
Policy InitiativeA collection of Azure Policy definitions grouped and assigned together as a single unit.
Deny EffectAzure Policy effect that blocks resource creation or update if the rule is violated.
Audit EffectAzure Policy effect that flags non-compliance without blocking the operation.
Defender for CloudUnified security posture management and threat protection across Azure resources.
Secure ScoreA Defender for Cloud metric (0–100) representing the security posture of your subscriptions.
Deployment ReadinessThe state where all pre-deployment gates (tests, scans, policies, approvals) have passed.
Blackout WindowA time period during which production deployments are prohibited (weekends, holidays, late night).
Smoke TestMinimal health-check tests run immediately after deployment to validate basic functionality.
Synthetic MonitorApplication Insights availability tests that simulate user flows at regular intervals.
Rollback TriggerAn automated condition (error rate spike, smoke test failure) that initiates deployment rollback.
Change WindowThe approved time range during which deployments are permitted.
Data Classification TagA required resource tag indicating the sensitivity level of data (public, internal, confidential, restricted).
Immutable Audit LogA tamper-proof record of all deployment actions stored in Log Analytics with retention policies.
KQLKusto Query Language — the query language used in Azure Log Analytics and Application Insights.

Component: ComplianceGateway

Screen Layout: A vertical checklist for a specific deployment, with each gate showing: gate name, status (pass/fail/pending), details, and timestamp. Gates include: Azure Policy compliance, vulnerability scan, tag validation, deployment window check, test coverage, runbook verification, and approval status. Overall readiness indicator (green checkmark or red block) at the top.

Component: PolicyDashboard

Screen Layout: Azure Policy compliance overview with compliance percentage per resource group. Drill-down table showing individual policy violations with resource name, policy name, effect, and remediation guidance. Filter by subscription, resource group, and policy category.

Component: DeploymentAuditLog

Screen Layout: Searchable, sortable table of all deployments with columns: timestamp, service, environment, version, deployed by, risk score, compliance gates summary, duration, and status. Filter by service, environment, date range, and status. Export to CSV.

Mock Data

mock-data/compliance-gates.json
{
  "services": [
    { "name": "checkout-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 100, "secure_score": 92, "deployment_ready": true },
    { "name": "catalog-api", "policy_compliance": 95, "vuln_count": 2, "tag_compliance": 100, "secure_score": 85, "deployment_ready": false },
    { "name": "order-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 95, "secure_score": 88, "deployment_ready": false },
    { "name": "identity-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 100, "secure_score": 95, "deployment_ready": true },
    { "name": "notification-svc", "policy_compliance": 100, "vuln_count": 1, "tag_compliance": 100, "secure_score": 90, "deployment_ready": true },
    { "name": "payment-processor", "policy_compliance": 98, "vuln_count": 0, "tag_compliance": 100, "secure_score": 91, "deployment_ready": true }
  ],
  "audit_log": [
    { "timestamp": "2026-02-28T14:32:00Z", "service": "checkout-service", "env": "production", "version": "3.4.1", "deployed_by": "jsmith", "risk_score": 35, "gates": "6/6 passed", "duration": "14m", "status": "succeeded" },
    { "timestamp": "2026-02-28T11:15:00Z", "service": "catalog-api", "env": "staging", "version": "2.8.0", "deployed_by": "adoe", "risk_score": 50, "gates": "5/6 passed", "duration": "11m", "status": "failed" },
    { "timestamp": "2026-02-27T16:45:00Z", "service": "order-service", "env": "production", "version": "4.1.2", "deployed_by": "mchen", "risk_score": 25, "gates": "6/6 passed", "duration": "16m", "status": "succeeded" },
    { "timestamp": "2026-02-27T10:20:00Z", "service": "identity-service", "env": "production", "version": "1.9.0", "deployed_by": "kpatel", "risk_score": 70, "gates": "6/6 passed", "duration": "22m", "status": "succeeded" },
    { "timestamp": "2026-02-26T15:00:00Z", "service": "notification-svc", "env": "production", "version": "2.3.5", "deployed_by": "rlee", "risk_score": 15, "gates": "6/6 passed", "duration": "9m", "status": "succeeded" }
  ]
}

Unified Demo Application Specification

Acme Feature Console (Azure)

Tech Stack: React 18, TypeScript, Tailwind CSS, Recharts, Mock API layer

The unified demo application brings together all four modules into a single dashboard experience for managing the complete feature lifecycle.

Key Screens

  1. Feature Flag Dashboard — All flags with state, targeting rules, rollout percentages, and TTL tracking
  2. Canary Monitor — Real-time canary deployment health, traffic weight, and success rate metrics
  3. Pipeline Dashboard — Pipeline runs, stage status, success rates, and deployment frequency
  4. PR Workflow Board — Kanban view of PRs with quality gates and review status
  5. CAB Dashboard — Change requests with risk scores, auto-approval history, and approval queue
  6. Compliance Gateway — Pre-deployment checklist with gate pass/fail status
  7. Deployment Audit Log — Searchable history of all deployments with compliance data

Architecture Overview

architecture/component-tree.ts
// Component Tree
App
├── Layout
│   ├── TopBar (theme toggle, user menu)
│   └── Sidebar (navigation)
├── Pages
│   ├── FeatureFlagDashboard
│   │   ├── FlagTable (sortable, filterable)
│   │   ├── FlagDetail (targeting rules, audit log)
│   │   └── CreateFlagModal
│   ├── CanaryMonitor
│   │   ├── CanaryCard (per deployment)
│   │   ├── MetricsChart (Recharts)
│   │   └── RollbackButton
│   ├── PipelineDashboard
│   │   ├── PipelineRunTable
│   │   ├── StageIndicator
│   │   └── DeploymentTimeline
│   ├── PRWorkflowBoard
│   │   ├── KanbanColumn (per status)
│   │   ├── PRCard
│   │   └── QualityGateBadges
│   ├── CABDashboard
│   │   ├── ChangeRequestQueue
│   │   ├── RiskScoreBadge
│   │   └── ApprovalHistory
│   ├── ComplianceGateway
│   │   ├── GateChecklist
│   │   ├── PolicyDashboard
│   │   └── ReadinessIndicator
│   └── AuditLog
│       ├── AuditTable (searchable)
│       └── ExportButton
└── Services
    ├── apiClient.ts (mock fetch layer)
    ├── flagService.ts
    ├── pipelineService.ts
    └── complianceService.ts

Data Model

types/data-model.ts
// Core Entities
FeatureFlag: { name, status, rollout_pct, env, targeting, ttl_days_remaining, created, team }
CanaryDeployment: { service, phase, weight_pct, success_rate, p99_latency, started_at }
PipelineRun: { id, pipeline, trigger, branch, stages, duration_min, actor, started_at }
PullRequest: { id, title, author, status, reviewers[], risk_score, quality_gates }
ChangeRequest: { id, service, summary, risk_score, risk_level, status, approvers[] }
ComplianceGate: { name, status, details, timestamp }
PolicyViolation: { resource_id, policy_name, effect, severity, detected_at }
DeploymentAudit: { timestamp, service, env, version, deployed_by, risk_score, gates, status }
ServiceHealth: { name, policy_compliance, vuln_count, secure_score, deployment_ready }

Full API Schema

MethodEndpointModuleDescription
GET/api/flagsM1List all feature flags with state and targeting
POST/api/flagsM1Create a new feature flag
PUT/api/flags/{name}M1Update flag state or targeting rules
DELETE/api/flags/{name}M1Retire and remove a feature flag
GET/api/canary/{service}M1Get canary deployment status
POST/api/canary/{service}/rollbackM1Trigger canary rollback
GET/api/pipelines/runsM2List pipeline runs with stage status
GET/api/pipelines/runs/{id}M2Get detailed pipeline run info
GET/api/pipelines/metricsM2Pipeline success rate, deployment frequency
POST/api/pipelines/triggerM2Manually trigger a pipeline run
GET/api/prsM3List PRs with review and quality gate status
GET/api/prs/{id}/riskM3Get risk assessment for a PR
GET/api/cab/queueM3List pending CAB change requests
POST/api/cab/{id}/approveM3Approve a change request
GET/api/compliance/readiness/{service}M4Get deployment readiness checklist
GET/api/compliance/policiesM4List policy compliance per resource group
GET/api/compliance/violationsM4List active policy violations
GET/api/audit/deploymentsM4Query deployment audit trail
GET/api/servicesAllList all services with health overview
GET/api/services/{name}/healthAllDetailed service health and compliance

Azure → AWS Service Mapping

CapabilityAzure VersionAWS Equivalent
Feature FlagsAzure App Configuration Feature ManagerAWS AppConfig Feature Flags
CI/CD PipelinesAzure DevOps Pipelines (YAML)AWS CodePipeline / GitHub Actions
Container RegistryAzure Container Registry (ACR)Amazon ECR
KubernetesAzure Kubernetes Service (AKS)Amazon EKS
Policy EngineAzure PolicyAWS Config Rules / SCPs
Security PostureMicrosoft Defender for CloudAWS Security Hub
Secrets ManagementAzure Key VaultAWS Secrets Manager
Monitoring / APMApplication Insights + Azure MonitorCloudWatch + X-Ray
Traffic ManagementAzure Traffic ManagerRoute 53 weighted routing
Service MeshIstio on AKS / Azure Service MeshAWS App Mesh
Artifact RepositoryAzure ArtifactsAWS CodeArtifact
Work Item TrackingAzure BoardsJira / AWS CodeCatalyst
Code RepositoryAzure DevOps ReposAWS CodeCommit / GitHub
Log AnalyticsAzure Log Analytics (KQL)CloudWatch Logs Insights