Feature Lifecycle — IDP Module Specification (Azure Edition)

Feature Lifecycle Phases

Every feature follows a structured lifecycle from inception to retirement. Each phase maps to specific IDP modules, Azure DevOps pipelines, App Configuration feature flags, and compliance gates. Click a phase below to jump to its explanation.

Author PR & Code Review

→

Build CI/CD Pipeline

→

Gate Compliance Check

→

Flag Controlled Rollout

→

Retire Cleanup Flags

Lifecycle phases explained

What is done in each phase of the feature lifecycle on Azure.

Author

Code and changes are authored and submitted via pull requests. Branch policies and code review (e.g. in Azure Repos or GitHub) ensure quality and alignment with standards before merging.

Build

Azure DevOps Pipelines (or equivalent) run CI: build, test, and package the application. Artifacts are published; container images may be built and pushed to ACR. Quality and security gates run in the pipeline.

Gate

Pre-deployment compliance checks: Azure Policy, Defender for Cloud findings, and pipeline gates must pass. Deployment windows and approval steps ensure only compliant changes are released.

Flag

Feature flags are used for controlled rollout (e.g. Azure App Configuration Feature Manager). Teams target audiences, set rollout percentages, and can kill-switch or roll back without redeploying.

Retire

Feature flags and related configuration are cleaned up once the feature is fully rolled out or deprecated. Flags are removed from App Configuration and code paths are simplified.

Target Platform

This specification targets Microsoft Azure with Azure DevOps Pipelines for CI/CD, Azure App Configuration Feature Manager for feature flags, AKS for compute, and Azure Policy + Defender for Cloud for compliance gates.

Module Summary

MODULE 1

Feature Flagging & Controlled Rollouts

Azure App Configuration Feature Manager, canary/blue-green deployments on AKS, progressive rollout patterns, and kill switches.

MODULE 2

CI/CD Pipeline Automation

Multi-stage Azure DevOps YAML pipelines, artifact management, Helm deployments to AKS, Key Vault integration, and pipeline templates.

MODULE 3

PR Approval Workflows & Codified CAB

Branch policies, build validation, codified Change Advisory Board with risk-based auto-approval, and quality gates.

MODULE 4

Deployment Compliance Gates

Azure Policy enforcement, Defender for Cloud integration, deployment blackout windows, smoke tests, and immutable audit trails.

01 Feature Flagging & Controlled Rollouts

Feature Flag Lifecycle

Every feature flag follows a strict lifecycle managed through Azure App Configuration Feature Manager. Flags transition through four phases:

Create Define flag & filters

→

Target Percentage/user targeting

→

Rollout Progressive 1→100%

→

Retire Remove flag debt

Azure App Configuration Feature Manager Required

All feature flags are centrally managed in Azure App Configuration using the Feature Manager capability. This provides a single source of truth for flag state across all environments.

Targeting filter: percentage-based rollouts and user/group targeting
Time window filter: auto-enable/disable flags within a date range
Custom filters: evaluate flag state based on arbitrary application context
Feature flags are environment-scoped — separate App Configuration instances per environment
Flag changes require PR approval in the configuration-as-code repo

Deployment Strategies on AKS

Canary Deployment (Flagger + Istio)

Use Flagger with Istio service mesh for automated canary analysis. Traffic is gradually shifted from the stable to canary version based on success rate and latency metrics.

Canary weight progression: 1% → 5% → 25% → 50% → 100%
Automatic rollback if error rate exceeds threshold (default 1%)
Canary analysis interval: 60 seconds
Metrics: request success rate, p99 latency from Application Insights

Blue-Green Deployment

Run two identical environments (blue/green) behind Azure Traffic Manager or Nginx Ingress canary annotations. Switch traffic atomically after validation.

Azure Traffic Manager weighted routing for DNS-level traffic split
Nginx Ingress canary-weight annotation for cluster-level routing
Instant rollback by reverting traffic weight to 0

Flag Naming Conventions Required

All feature flags follow the pattern: <team>.<service>.<feature>

Example	Description
`payments.checkout.new-ui`	New checkout UI for payments team
`catalog.search.vector-ranking`	Vector-based search ranking experiment
`orders.fulfillment.batch-processing`	Batch processing mode for fulfillment
`identity.auth.passkey-login`	Passkey authentication experiment

Flag Hygiene & Governance

TTL policy: every flag must have a maximum lifespan (default 90 days)
Stale flag automation: flags exceeding TTL trigger alerts and auto-create cleanup tickets
Audit logging: all flag state changes recorded in Azure Monitor
Flag debt dashboard: weekly report of active flags by team, age, and status

Emergency Kill Switches Required

Production Incident Protocol

Every feature deployed behind a flag must have a kill switch. Kill switches instantly disable the feature without a deployment. They are toggled via Azure App Configuration REST API or the Azure Portal and take effect within seconds via real-time refresh.

Observability Integration

Correlate feature flag state with application metrics using Application Insights custom dimensions. Every request tagged with active feature flags enables:

Error rate comparison: flag-on vs. flag-off cohorts
Performance impact: latency delta per flag
Business metrics: conversion rate per variant
Custom KQL queries in Log Analytics for flag-based analysis

Terraform: Azure App Configuration with Feature Flags

modules/feature-flags/main.tf

# modules/feature-flags/main.tf
resource "azurerm_app_configuration" "flags" {
  name                = "appconf-${var.environment}-${var.service_name}"
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = "standard"

  identity {
    type = "SystemAssigned"
  }

  tags = var.common_tags
}

resource "azurerm_app_configuration_feature" "feature" {
  for_each             = { for f in var.feature_flags : f.name => f }
  configuration_store_id = azurerm_app_configuration.flags.id
  name                   = each.value.name
  label                  = var.environment
  enabled                = each.value.enabled
  description            = each.value.description

  dynamic "targeting_filter" {
    for_each = each.value.targeting != null ? [each.value.targeting] : []
    content {
      default_rollout_percentage = targeting_filter.value.percentage
      dynamic "groups" {
        for_each = targeting_filter.value.groups
        content {
          name               = groups.value.name
          rollout_percentage = groups.value.percentage
        }
      }
    }
  }

  dynamic "timewindow_filter" {
    for_each = each.value.time_window != null ? [each.value.time_window] : []
    content {
      start = timewindow_filter.value.start
      end   = timewindow_filter.value.end
    }
  }
}

# Grant AKS managed identity read access
resource "azurerm_role_assignment" "aks_reader" {
  scope                = azurerm_app_configuration.flags.id
  role_definition_name = "App Configuration Data Reader"
  principal_id         = var.aks_managed_identity_principal_id
}

Azure DevOps Pipeline: Canary Deployment with Flagger

pipelines/canary-deploy.yaml

# pipelines/canary-deploy.yaml
trigger:
  branches:
    include: [main]

pool:
  vmImage: 'ubuntu-latest'

variables:
  - group: acr-credentials
  - name: acrName
    value: 'acmeacr'
  - name: imageRepository
    value: '$(Build.Repository.Name)'
  - name: tag
    value: '$(Build.BuildId)'

stages:
  - stage: Build
    displayName: 'Build & Push Image'
    jobs:
      - job: BuildImage
        steps:
          - task: Docker@2
            displayName: 'Build and push to ACR'
            inputs:
              containerRegistry: 'acr-service-connection'
              repository: '$(imageRepository)'
              command: 'buildAndPush'
              Dockerfile: '**/Dockerfile'
              tags: |
                $(tag)
                latest

  - stage: DeployCanary
    displayName: 'Deploy Canary to AKS'
    dependsOn: Build
    jobs:
      - deployment: CanaryDeploy
        environment: 'aks-production'
        strategy:
          runOnce:
            deploy:
              steps:
                - task: KubernetesManifest@1
                  displayName: 'Update canary image'
                  inputs:
                    action: 'set-image'
                    kubernetesServiceConnection: 'aks-prod'
                    namespace: '$(team)'
                    containers: |
                      $(acrName).azurecr.io/$(imageRepository):$(tag)
                    manifests: |
                      k8s/deployment.yaml

  - stage: ValidateCanary
    displayName: 'Validate Canary Health'
    dependsOn: DeployCanary
    jobs:
      - job: HealthCheck
        steps:
          - script: |
              echo "Waiting for Flagger canary analysis..."
              kubectl -n $(team) wait canary/$(service) \
                --for=condition=promoted --timeout=600s
            displayName: 'Wait for Flagger promotion'
          - script: |
              CANARY_STATUS=$(kubectl -n $(team) get canary/$(service) \
                -o jsonpath='{.status.phase}')
              if [ "$CANARY_STATUS" != "Succeeded" ]; then
                echo "##vso[task.logissue type=error]Canary failed: $CANARY_STATUS"
                exit 1
              fi
            displayName: 'Verify canary status'

Blue-Green Deployment Pipeline

pipelines/blue-green-deploy.yaml

# pipelines/blue-green-deploy.yaml
stages:
  - stage: DeployGreen
    displayName: 'Deploy to Green Slot'
    jobs:
      - deployment: GreenDeploy
        environment: 'aks-production.green'
        strategy:
          runOnce:
            deploy:
              steps:
                - task: HelmDeploy@0
                  displayName: 'Helm upgrade green'
                  inputs:
                    connectionType: 'Kubernetes Service Connection'
                    kubernetesServiceConnection: 'aks-prod'
                    namespace: '$(team)'
                    command: 'upgrade'
                    chartType: 'FilePath'
                    chartPath: 'charts/$(service)'
                    releaseName: '$(service)-green'
                    overrideValues: |
                      image.tag=$(tag)
                      slot=green

  - stage: SmokeTestGreen
    displayName: 'Smoke Test Green'
    dependsOn: DeployGreen
    jobs:
      - job: SmokeTest
        steps:
          - script: |
              curl -sf http://$(service)-green.$(team).svc.cluster.local/health
            displayName: 'Health check green'

  - stage: SwitchTraffic
    displayName: 'Switch Traffic to Green'
    dependsOn: SmokeTestGreen
    jobs:
      - deployment: TrafficSwitch
        environment: 'aks-production'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    kubectl -n $(team) patch ingress $(service) \
                      -p '{"metadata":{"annotations":{
                        "nginx.ingress.kubernetes.io/canary":"true",
                        "nginx.ingress.kubernetes.io/canary-weight":"100"
                      }}}'
                  displayName: 'Route 100% to green'

.NET SDK Integration: Microsoft.FeatureManagement

src/Startup.cs

// src/Startup.cs
using Microsoft.FeatureManagement;
using Microsoft.FeatureManagement.FeatureFilters;

public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        // Connect to Azure App Configuration
        services.AddAzureAppConfiguration();

        // Add feature management with targeting
        services.AddFeatureManagement()
            .AddFeatureFilter<TargetingFilter>()
            .AddFeatureFilter<TimeWindowFilter>()
            .AddFeatureFilter<PercentageFilter>();

        // Register targeting context accessor
        services.AddSingleton<ITargetingContextAccessor,
            HttpContextTargetingContextAccessor>();
    }

    public void Configure(IApplicationBuilder app)
    {
        // Enable dynamic configuration refresh
        app.UseAzureAppConfiguration();
    }
}

// Usage in a controller
[ApiController]
[Route("api/[controller]")]
public class CheckoutController : ControllerBase
{
    private readonly IFeatureManager _featureManager;

    public CheckoutController(IFeatureManager featureManager)
    {
        _featureManager = featureManager;
    }

    [HttpGet]
    public async Task<IActionResult> GetCheckout()
    {
        if (await _featureManager
            .IsEnabledAsync("payments.checkout.new-ui"))
        {
            return Ok(new { version = "v2", ui = "new" });
        }
        return Ok(new { version = "v1", ui = "classic" });
    }
}

Python SDK Integration

src/feature_flags.py

# src/feature_flags.py
from azure.appconfiguration.provider import load
from azure.identity import DefaultAzureCredential
from featuremanagement import FeatureManager

credential = DefaultAzureCredential()

# Load configuration with feature flags
config = load(
    endpoint="https://appconf-prod-checkout.azconfig.io",
    credential=credential,
    feature_flag_enabled=True,
    feature_flag_refresh_enabled=True,
    refresh_interval=30,  # seconds
)

feature_manager = FeatureManager(config)

# Check flag state
async def get_checkout_experience(user_id: str):
    """Return checkout experience based on feature flag."""
    context = {"user_id": user_id, "groups": ["beta-testers"]}

    if feature_manager.is_enabled(
        "payments.checkout.new-ui", context
    ):
        return {"version": "v2", "ui": "new"}
    return {"version": "v1", "ui": "classic"}

Flagger Canary Resource

k8s/canary.yaml

# k8s/canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: checkout-service
  namespace: payments
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: checkout-service
  service:
    port: 8080
    targetPort: 8080
    gateways:
      - istio-system/public-gateway
    hosts:
      - checkout.acme.com
  analysis:
    interval: 60s
    threshold: 5
    maxWeight: 50
    stepWeight: 5
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 60s
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 60s
    webhooks:
      - name: load-test
        type: rollout
        url: http://flagger-loadtester.istio-system/
        metadata:
          cmd: "hey -z 60s -q 10 -c 2 http://checkout-service-canary.payments:8080/"

Azure Traffic Manager: Weighted Routing

modules/traffic-manager/main.tf

# modules/traffic-manager/main.tf
resource "azurerm_traffic_manager_profile" "canary" {
  name                   = "tm-${var.service_name}-canary"
  resource_group_name    = var.resource_group_name
  traffic_routing_method = "Weighted"

  dns_config {
    relative_name = var.service_name
    ttl           = 30
  }

  monitor_config {
    protocol                     = "HTTPS"
    port                         = 443
    path                         = "/health"
    interval_in_seconds          = 10
    timeout_in_seconds           = 5
    tolerated_number_of_failures = 2
  }

  tags = var.common_tags
}

resource "azurerm_traffic_manager_azure_endpoint" "stable" {
  name               = "stable"
  profile_id         = azurerm_traffic_manager_profile.canary.id
  target_resource_id = var.stable_public_ip_id
  weight             = var.stable_weight  # e.g., 95
}

resource "azurerm_traffic_manager_azure_endpoint" "canary" {
  name               = "canary"
  profile_id         = azurerm_traffic_manager_profile.canary.id
  target_resource_id = var.canary_public_ip_id
  weight             = var.canary_weight  # e.g., 5
}

Glossary

Term	Definition
Feature Flag	A runtime toggle that controls feature visibility without code deployment. Stored in Azure App Configuration.
Feature Filter	A rule that determines whether a flag is enabled for a given context (targeting, time window, custom).
Targeting	Evaluating flag state based on user identity, group membership, or percentage rollout.
Canary Deployment	Gradually shifting traffic to a new version while monitoring health metrics. Uses Flagger + Istio on AKS.
Blue-Green Deployment	Running two identical environments and switching traffic atomically between them.
Progressive Rollout	Incrementally increasing the percentage of users exposed to a feature: 1% → 5% → 25% → 50% → 100%.
Kill Switch	An emergency flag disable mechanism that takes effect within seconds via App Configuration refresh.
Flag Debt	Accumulated feature flags that have outlived their purpose but remain in code and configuration.
Variant	A specific configuration value associated with a feature flag, enabling A/B testing with multiple options.
A/B Test	An experiment comparing two or more variants of a feature using statistical analysis.
Azure App Configuration	Centralized Azure service for managing application settings and feature flags.
Feature Manager	The App Configuration capability that provides feature flag evaluation with filters.
Flagger	A Kubernetes operator that automates canary, A/B testing, and blue-green deployments.
Istio	An open-source service mesh providing traffic management, security, and observability for Kubernetes.
Traffic Manager	Azure DNS-based traffic load balancer supporting weighted, priority, and geographic routing.
Application Insights	Azure APM service for monitoring live applications, detecting anomalies, and diagnosing issues.

Component: FeatureFlagDashboard

Screen Layout: A data table listing all feature flags with columns for name, status (active/canary/retired), targeting rules, rollout percentage progress bar, environment scope, TTL remaining, and last modified date. Filter controls for team, status, and environment. Bulk actions for enable/disable/archive. Each row expandable to show targeting details and flag audit history.

Component: CanaryMonitor

Screen Layout: Real-time dashboard showing active canary deployments. Each canary displays a progress gauge (current weight %), request success rate chart, p99 latency sparkline, and Flagger phase indicator (Initializing → Progressing → Promoting → Succeeded/Failed). Error budget burn-down visualization.

API Endpoints

Method	Endpoint	Description
`GET`	`/api/flags`	List all feature flags with state and targeting
`POST`	`/api/flags`	Create a new feature flag
`PUT`	`/api/flags/{name}`	Update flag state or targeting rules
`DELETE`	`/api/flags/{name}`	Retire and remove a feature flag
`GET`	`/api/canary/{service}`	Get canary deployment status and metrics
`POST`	`/api/canary/{service}/rollback`	Trigger immediate canary rollback

Mock Data

mock-data/feature-flags.json

{
  "feature_flags": [
    { "name": "payments.checkout.new-ui", "status": "active", "rollout_pct": 50, "env": "production", "ttl_days_remaining": 45, "created": "2025-12-01" },
    { "name": "catalog.search.vector-ranking", "status": "canary", "rollout_pct": 5, "env": "production", "ttl_days_remaining": 82, "created": "2025-12-15" },
    { "name": "orders.fulfillment.batch-processing", "status": "active", "rollout_pct": 100, "env": "production", "ttl_days_remaining": 12, "created": "2025-10-01" },
    { "name": "identity.auth.passkey-login", "status": "active", "rollout_pct": 25, "env": "staging", "ttl_days_remaining": 67, "created": "2025-12-20" },
    { "name": "payments.refunds.instant-refund", "status": "canary", "rollout_pct": 1, "env": "production", "ttl_days_remaining": 88, "created": "2026-01-05" },
    { "name": "catalog.recommendations.ml-v3", "status": "retired", "rollout_pct": 0, "env": "all", "ttl_days_remaining": 0, "created": "2025-08-01" },
    { "name": "orders.tracking.real-time-map", "status": "active", "rollout_pct": 75, "env": "production", "ttl_days_remaining": 30, "created": "2025-11-15" },
    { "name": "identity.mfa.biometric-prompt", "status": "active", "rollout_pct": 10, "env": "staging", "ttl_days_remaining": 60, "created": "2025-12-28" },
    { "name": "payments.pricing.dynamic-discount", "status": "canary", "rollout_pct": 5, "env": "production", "ttl_days_remaining": 85, "created": "2026-01-10" },
    { "name": "catalog.images.webp-optimization", "status": "retired", "rollout_pct": 0, "env": "all", "ttl_days_remaining": 0, "created": "2025-07-01" }
  ]
}

02 CI/CD Pipeline Automation

Pipeline Architecture Required

All CI/CD pipelines use Azure DevOps multi-stage YAML pipelines stored in the application repository. Pipelines follow a four-stage model:

Build Compile & test

→

Test Quality gates

→

Staging Pre-production

→

Production Live deployment

Branch Strategy

Trunk-based development with short-lived feature branches. Release branches for hotfixes only.

main — production-ready trunk, CI triggers on merge
feature/* — short-lived branches (< 3 days), PR required for merge
release/* — created from main for hotfix releases only
No long-lived develop branches — feature flags replace them

Pipeline Triggers & Approvals

CI trigger: automatic on PR merge to main
Scheduled builds: nightly for integration tests and security scans
Environment approvals: staging auto-deploys; production requires manual approval
Approval checks: branch protection, required reviews, artifact verification

Artifact Management

Use Azure Artifacts for package feeds (NuGet, npm) and Azure Container Registry (ACR) for Docker images.

All artifacts are versioned with build ID and Git SHA
ACR geo-replicated to eastus2 and westus2 for redundancy
Image vulnerability scanning enabled via Defender for Containers
Retention policy: keep last 30 production images, purge untagged after 7 days

Secret Management in Pipelines Required

No Secrets in Pipelines

Never store secrets as pipeline variables. Use variable groups linked to Azure Key Vault to inject secrets at runtime. Service connections use workload identity federation (no client secrets).

Pipeline Optimization

Caching: cache NuGet/npm packages, Docker layers between builds
Parallel jobs: run unit tests and lint checks in parallel
Shared templates: central pipeline template repo to avoid duplication
Notifications: Microsoft Teams channel integration for build status

Multi-Stage Pipeline: .NET Microservice Required

pipelines/dotnet-microservice.yaml

# pipelines/dotnet-microservice.yaml
trigger:
  branches:
    include: [main]

pr:
  branches:
    include: [main]

variables:
  - group: keyvault-secrets  # Linked to Azure Key Vault
  - name: buildConfiguration
    value: 'Release'
  - name: acrName
    value: 'acmeacr'

stages:
  - stage: Build
    displayName: 'Build & Unit Test'
    jobs:
      - job: BuildAndTest
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: UseDotNet@2
            inputs:
              version: '8.x'
          - task: Cache@2
            inputs:
              key: 'nuget | "$(Agent.OS)" | **/packages.lock.json'
              restoreKeys: 'nuget | "$(Agent.OS)"'
              path: '$(NUGET_PACKAGES)'
          - script: dotnet restore
            displayName: 'Restore packages'
          - script: dotnet build --configuration $(buildConfiguration) --no-restore
            displayName: 'Build'
          - script: dotnet test --configuration $(buildConfiguration) --no-build --collect:"XPlat Code Coverage" --results-directory $(Agent.TempDirectory)/coverage
            displayName: 'Run unit tests'
          - task: PublishCodeCoverageResults@2
            inputs:
              summaryFileLocation: '$(Agent.TempDirectory)/coverage/**/coverage.cobertura.xml'

  - stage: SecurityScan
    displayName: 'Security & Quality'
    dependsOn: Build
    jobs:
      - job: SonarQube
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: SonarQubePrepare@5
            inputs:
              SonarQube: 'sonarqube-connection'
              scannerMode: 'MSBuild'
              projectKey: '$(Build.Repository.Name)'
          - script: dotnet build --configuration $(buildConfiguration)
            displayName: 'Build for analysis'
          - task: SonarQubeAnalyze@5
          - task: SonarQubePublish@5
            inputs:
              pollingTimeoutSec: '300'

      - job: ContainerScan
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: Docker@2
            displayName: 'Build image'
            inputs:
              command: 'build'
              Dockerfile: '**/Dockerfile'
              tags: 'scan-$(Build.BuildId)'
          - script: |
              trivy image --exit-code 1 --severity HIGH,CRITICAL \
                $(acrName).azurecr.io/$(Build.Repository.Name):scan-$(Build.BuildId)
            displayName: 'Trivy vulnerability scan'

  - stage: PushImage
    displayName: 'Push to ACR'
    dependsOn: SecurityScan
    jobs:
      - job: Push
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: Docker@2
            inputs:
              containerRegistry: 'acr-service-connection'
              repository: '$(Build.Repository.Name)'
              command: 'buildAndPush'
              tags: |
                $(Build.BuildId)
                $(Build.SourceVersion)

  - stage: DeployStaging
    displayName: 'Deploy to Staging'
    dependsOn: PushImage
    jobs:
      - deployment: StagingDeploy
        environment: 'aks-staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - task: HelmDeploy@0
                  inputs:
                    connectionType: 'Kubernetes Service Connection'
                    kubernetesServiceConnection: 'aks-staging'
                    namespace: '$(team)'
                    command: 'upgrade'
                    chartType: 'FilePath'
                    chartPath: 'charts/$(service)'
                    releaseName: '$(service)'
                    overrideValues: 'image.tag=$(Build.BuildId)'
                    valueFile: 'charts/$(service)/values-staging.yaml'

  - stage: DeployProduction
    displayName: 'Deploy to Production'
    dependsOn: DeployStaging
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - deployment: ProductionDeploy
        environment: 'aks-production'  # Has approval gate
        strategy:
          runOnce:
            deploy:
              steps:
                - task: HelmDeploy@0
                  inputs:
                    connectionType: 'Kubernetes Service Connection'
                    kubernetesServiceConnection: 'aks-production'
                    namespace: '$(team)'
                    command: 'upgrade'
                    chartType: 'FilePath'
                    chartPath: 'charts/$(service)'
                    releaseName: '$(service)'
                    overrideValues: 'image.tag=$(Build.BuildId)'
                    valueFile: 'charts/$(service)/values-production.yaml'

Shared Pipeline Template (Extends Pattern)

templates/build-test-deploy.yaml

# templates/build-test-deploy.yaml
# Shared template — consumed via "extends" pattern
parameters:
  - name: service
    type: string
  - name: team
    type: string
  - name: language
    type: string
    values: ['dotnet', 'node', 'python']
  - name: chartPath
    type: string
    default: 'charts/$(service)'

stages:
  - stage: Build
    jobs:
      - job: Build
        steps:
          - template: steps/${{ parameters.language }}-build.yaml
  - stage: Test
    dependsOn: Build
    jobs:
      - job: QualityGate
        steps:
          - template: steps/sonarqube-scan.yaml
          - template: steps/container-scan.yaml
  - stage: DeployStaging
    dependsOn: Test
    jobs:
      - deployment: Staging
        environment: 'aks-staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - template: steps/helm-deploy.yaml
                  parameters:
                    environment: staging
                    service: ${{ parameters.service }}
                    team: ${{ parameters.team }}

# Consumer pipeline (azure-pipelines.yaml in service repo):
# extends:
#   template: templates/build-test-deploy.yaml@pipeline-templates
#   parameters:
#     service: checkout-service
#     team: payments
#     language: dotnet

Terraform Pipeline: Plan on PR, Apply on Merge

pipelines/terraform.yaml

# pipelines/terraform.yaml
trigger:
  branches:
    include: [main]
  paths:
    include: ['infra/**']

pr:
  branches:
    include: [main]
  paths:
    include: ['infra/**']

stages:
  - stage: Plan
    displayName: 'Terraform Plan'
    jobs:
      - job: TerraformPlan
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: TerraformInstaller@1
            inputs:
              terraformVersion: '1.7.x'
          - task: TerraformCLI@1
            displayName: 'Init'
            inputs:
              command: 'init'
              workingDirectory: 'infra/$(environment)'
              backendType: 'azurerm'
              backendServiceArm: 'terraform-sp'
              backendAzureRmResourceGroupName: 'rg-terraform-state'
              backendAzureRmStorageAccountName: 'stterraformstate'
              backendAzureRmContainerName: 'tfstate'
              backendAzureRmKey: '$(environment)/terraform.tfstate'
          - task: TerraformCLI@1
            displayName: 'Plan'
            inputs:
              command: 'plan'
              workingDirectory: 'infra/$(environment)'
              environmentServiceName: 'terraform-sp'
              commandOptions: '-out=tfplan'
          - task: TerraformCLI@1
            displayName: 'Show Plan'
            inputs:
              command: 'show'
              workingDirectory: 'infra/$(environment)'
              inputTargetPlanOrStateFilePath: 'infra/$(environment)/tfplan'

  - stage: Apply
    displayName: 'Terraform Apply'
    dependsOn: Plan
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - deployment: TerraformApply
        environment: 'infra-$(environment)'  # Approval gate
        strategy:
          runOnce:
            deploy:
              steps:
                - task: TerraformCLI@1
                  displayName: 'Apply'
                  inputs:
                    command: 'apply'
                    workingDirectory: 'infra/$(environment)'
                    environmentServiceName: 'terraform-sp'
                    commandOptions: 'tfplan'

Variable Group Linked to Key Vault

pipelines/variable-group.yaml

# In Azure DevOps UI or REST API:
# Variable group "keyvault-secrets" linked to Key Vault "kv-prod-checkout"
# Secrets auto-mapped: db-connection-string, api-key, jwt-secret

# Pipeline usage:
variables:
  - group: keyvault-secrets

steps:
  - script: |
      # Secrets are available as environment variables
      echo "Connection verified"
      dotnet run --urls "http://+:8080"
    env:
      ConnectionStrings__Default: $(db-connection-string)
      ApiKey: $(api-key)
      Jwt__Secret: $(jwt-secret)
    displayName: 'Run with Key Vault secrets'

Node.js Microservice Pipeline

pipelines/node-microservice.yaml

# pipelines/node-microservice.yaml
stages:
  - stage: Build
    jobs:
      - job: BuildAndTest
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: '20.x'
          - task: Cache@2
            inputs:
              key: 'npm | "$(Agent.OS)" | package-lock.json'
              path: '$(npm_config_cache)'
          - script: npm ci
            displayName: 'Install dependencies'
          - script: npm run lint
            displayName: 'Lint'
          - script: npm run test:coverage
            displayName: 'Run tests with coverage'
          - task: PublishTestResults@2
            inputs:
              testResultsFormat: 'JUnit'
              testResultsFiles: 'junit.xml'
          - script: npm run build
            displayName: 'Build'
          - task: Docker@2
            displayName: 'Build & push to ACR'
            inputs:
              containerRegistry: 'acr-service-connection'
              repository: '$(Build.Repository.Name)'
              command: 'buildAndPush'
              tags: '$(Build.BuildId)'

Glossary

Term	Definition
CI/CD	Continuous Integration / Continuous Delivery — automated build, test, and deployment pipeline.
Pipeline Stage	A logical boundary in a pipeline (Build, Test, Deploy) that can have its own agent pool and conditions.
Job	A unit of work within a stage that runs on a single agent.
Step	The smallest unit of execution — a script or task within a job.
Task	A pre-built Azure DevOps building block (e.g., Docker@2, HelmDeploy@0).
Environment	An Azure DevOps resource representing a deployment target with approval gates.
Approval Gate	A manual or automated check that must pass before a pipeline stage can execute.
Variable Group	A named set of variables that can be linked to Azure Key Vault for secret injection.
Service Connection	An Azure DevOps resource providing authenticated access to external services (Azure, ACR, K8s).
Workload Identity Federation	Secretless authentication using federated credentials — no client secret or certificate stored.
Pipeline Template	Reusable YAML defining stages, jobs, or steps that can be consumed via extends or template references.
Extends Template	A pipeline pattern where the consumer pipeline inherits stages from a shared template.
Azure Artifacts	Package management service for NuGet, npm, Maven, Python, and Universal Packages.
ACR Task	Azure Container Registry's built-in build engine for building images without a local Docker daemon.
Helm Release	A deployed instance of a Helm chart on a Kubernetes cluster.

Component: PipelineDashboard

Screen Layout: A table of recent pipeline runs with columns for pipeline name, trigger (CI/manual/scheduled), branch, stages with colored status indicators (green=succeeded, yellow=running, red=failed, gray=skipped), duration, and actor. Click a row to see stage details with individual step logs. Filter by team, status, and date range. Summary cards at top showing: total runs today, success rate, average duration, and deployments count.

Component: DeploymentTimeline

Screen Layout: Horizontal swimlane timeline with rows per environment (Dev, Staging, Production). Each deployment is a colored marker on the timeline showing service name, version, and timestamp. Hover for deployment details. Vertical lines connect promotions across environments.

Mock Data

mock-data/pipeline-runs.json

{
  "pipeline_runs": [
    { "id": 1042, "pipeline": "checkout-service", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 14, "actor": "jsmith" },
    { "id": 1041, "pipeline": "catalog-api", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "pending"}, "duration_min": 11, "actor": "adoe" },
    { "id": 1040, "pipeline": "order-service", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "failed", "staging": "skipped", "production": "skipped"}, "duration_min": 6, "actor": "mchen" },
    { "id": 1039, "pipeline": "identity-service", "trigger": "manual", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 18, "actor": "kpatel" },
    { "id": 1038, "pipeline": "infra-terraform", "trigger": "ci", "branch": "main", "stages": {"plan": "succeeded", "apply": "succeeded"}, "duration_min": 8, "actor": "platform-bot" },
    { "id": 1037, "pipeline": "checkout-service", "trigger": "ci", "branch": "feature/new-cart", "stages": {"build": "succeeded", "test": "succeeded", "staging": "skipped", "production": "skipped"}, "duration_min": 9, "actor": "jsmith" },
    { "id": 1036, "pipeline": "payment-processor", "trigger": "scheduled", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "running"}, "duration_min": 12, "actor": "scheduler" },
    { "id": 1035, "pipeline": "notification-svc", "trigger": "ci", "branch": "main", "stages": {"build": "succeeded", "test": "succeeded", "staging": "succeeded", "production": "succeeded"}, "duration_min": 10, "actor": "rlee" }
  ],
  "summary": { "runs_today": 24, "success_rate": 87.5, "avg_duration_min": 12.3, "deployments_today": 6 }
}

03 PR Approval Workflows & Codified CAB

Branch Policy Configuration Required

All repositories enforce branch policies on main:

Minimum reviewers: 2 approvals required (1 for documentation-only changes)
Required reviewers by path: auto-assigned based on file path patterns (CODEOWNERS equivalent)
Build validation: CI pipeline must pass before merge is allowed
Comment resolution: all PR comments must be resolved
Work item linking: every PR linked to an Azure Boards work item

Status Checks

SonarQube quality gate: must pass (no new bugs, 80% coverage on new code)
Defender for DevOps: security scan for secrets, vulnerabilities, and IaC misconfigurations
License compliance: third-party license audit (no GPL in proprietary code)
Build validation pipeline: compile + unit tests must succeed

Merge Strategies

Branch Type	Strategy	Rationale
`feature/*` → `main`	Squash merge	Clean linear history, one commit per feature
`release/*` → `main`	Merge commit	Preserve release branch history for audit
`hotfix/*` → `main`	Merge commit	Preserve hotfix context

Code Review SLA

Priority	Response Time	Approval Time
P1 — Critical	< 1 hour	< 4 hours
P2 — High	< 4 hours	< 8 hours
P3 — Standard	< 8 hours	< 24 hours

Codified CAB (Change Advisory Board) Required

The CAB process is automated through Azure DevOps pipeline environment approvals, replacing manual meetings for most changes.

Risk Assessment

Every PR is automatically scored for change risk based on:

Files changed: infrastructure/security files score higher
Blast radius: number of downstream services affected
Service tier: Tier-1 (customer-facing) changes score higher than Tier-3 (internal tooling)
Change size: large diffs score higher

Auto-Approval Rules

Risk Level	Score	Approval
Low	0–30	Auto-approved (documentation, config, non-Tier-1)
Medium	31–70	Single approver from service owner group
High	71–100	Multi-approver: service owner + platform team + security

CAB Automation

Auto-generate change summary from PR title, description, and file diff stats
Integration with ServiceNow via Azure DevOps Service Hooks for change record creation
Post-approval: deployment pipeline triggered automatically
Audit trail: all CAB decisions logged with risk score, approvers, and timestamps

Quality Gates

Unit test coverage: ≥80% on new code, ≥60% overall
Integration test coverage: ≥60% for service boundary tests
Complexity limits: cyclomatic complexity < 15 per method
Duplicated lines: < 3% duplication in new code

PR Template

.azuredevops/pull_request_template.md

## Summary


## Type of Change
- [ ] Feature (new functionality)
- [ ] Bug fix
- [ ] Infrastructure / Config
- [ ] Documentation
- [ ] Refactor (no behavior change)

## Checklist
- [ ] Unit tests added/updated (coverage ≥80%)
- [ ] Integration tests added (if applicable)
- [ ] Documentation updated
- [ ] Feature flag created (if new feature)
- [ ] Runbook updated (if operational change)
- [ ] Security review (if auth/data changes)
- [ ] Database migration tested (if schema change)

## Risk Assessment
- **Blast radius:** [Low / Medium / High]
- **Service tier:** [Tier-1 / Tier-2 / Tier-3]
- **Rollback plan:** [Describe rollback strategy]

## Related Work Items
AB#

Change Risk Assessment Script

scripts/risk-assessment.py

#!/usr/bin/env python3
"""Automated change risk assessment for Azure DevOps PRs."""
import json
import os
import sys
from dataclasses import dataclass

HIGH_RISK_PATHS = [
    "infra/", "terraform/", "k8s/", "charts/",
    "Dockerfile", "docker-compose",
    ".github/", ".azuredevops/", "pipelines/",
    "migrations/", "security/", "auth/",
]

TIER_1_SERVICES = [
    "checkout-service", "payment-processor",
    "identity-service", "order-service",
]

@dataclass
class RiskScore:
    total: int
    level: str  # low, medium, high
    factors: list

def assess_risk(
    changed_files: list[str],
    lines_changed: int,
    service_name: str,
    downstream_count: int,
) -> RiskScore:
    score = 0
    factors = []

    # File path risk
    infra_files = [f for f in changed_files
                   if any(f.startswith(p) for p in HIGH_RISK_PATHS)]
    if infra_files:
        score += 30
        factors.append(f"Infrastructure files: {len(infra_files)}")

    # Blast radius
    if downstream_count > 5:
        score += 25
        factors.append(f"High blast radius: {downstream_count} services")
    elif downstream_count > 2:
        score += 15
        factors.append(f"Medium blast radius: {downstream_count} services")

    # Service tier
    if service_name in TIER_1_SERVICES:
        score += 20
        factors.append(f"Tier-1 service: {service_name}")

    # Change size
    if lines_changed > 500:
        score += 15
        factors.append(f"Large change: {lines_changed} lines")
    elif lines_changed > 200:
        score += 10
        factors.append(f"Medium change: {lines_changed} lines")

    # Database migrations
    migration_files = [f for f in changed_files if "migration" in f.lower()]
    if migration_files:
        score += 20
        factors.append(f"Database migration: {len(migration_files)} files")

    level = "low" if score <= 30 else "medium" if score <= 70 else "high"
    return RiskScore(total=min(score, 100), level=level, factors=factors)

if __name__ == "__main__":
    result = assess_risk(
        changed_files=json.loads(os.environ.get("CHANGED_FILES", "[]")),
        lines_changed=int(os.environ.get("LINES_CHANGED", "0")),
        service_name=os.environ.get("SERVICE_NAME", ""),
        downstream_count=int(os.environ.get("DOWNSTREAM_COUNT", "0")),
    )
    print(json.dumps({
        "score": result.total,
        "level": result.level,
        "factors": result.factors,
    }, indent=2))
    # Set Azure DevOps variable for pipeline gates
    print(f"##vso[task.setvariable variable=riskLevel]{result.level}")
    print(f"##vso[task.setvariable variable=riskScore]{result.total}")

Build Validation Pipeline

pipelines/pr-validation.yaml

# pipelines/pr-validation.yaml
# Triggered as a build validation policy on the main branch
trigger: none
pr:
  branches:
    include: [main]

pool:
  vmImage: 'ubuntu-latest'

steps:
  - script: dotnet restore && dotnet build --configuration Release
    displayName: 'Build'

  - script: dotnet test --no-build --collect:"XPlat Code Coverage"
    displayName: 'Unit tests'

  - task: SonarQubePrepare@5
    inputs:
      SonarQube: 'sonarqube-connection'
      scannerMode: 'MSBuild'
      projectKey: '$(Build.Repository.Name)'
  - task: SonarQubeAnalyze@5
  - task: SonarQubePublish@5

  # Risk assessment
  - script: |
      CHANGED=$(git diff --name-only origin/main...HEAD)
      LINES=$(git diff --stat origin/main...HEAD | tail -1 | awk '{print $4+$6}')
      export CHANGED_FILES=$(echo "$CHANGED" | jq -R . | jq -s .)
      export LINES_CHANGED=$LINES
      export SERVICE_NAME=$(basename $(pwd))
      python scripts/risk-assessment.py
    displayName: 'Risk assessment'

  - script: |
      echo "Risk Level: $(riskLevel)"
      echo "Risk Score: $(riskScore)"
      echo "##vso[task.addattachment type=risk-assessment;name=risk]$(riskLevel):$(riskScore)"
    displayName: 'Report risk score'

ServiceNow Integration via Service Hook

hooks/servicenow-change-record.json

{
  "publisherId": "tfs",
  "eventType": "ms.vss-release.deployment-approval-pending-event",
  "consumerActionId": "httpRequest",
  "consumerInputs": {
    "url": "https://acme.service-now.com/api/sn_chg_rest/change",
    "httpHeaders": "Content-Type: application/json\nAuthorization: Bearer {{token}}",
    "resourceDetailsToSend": "all",
    "messagesToSend": "all",
    "detailedMessagesToSend": "all"
  },
  "publisherInputs": {
    "releaseEnvironmentId": "production",
    "releaseDefinitionId": "",
    "projectId": "{{projectId}}"
  }
}

Codified CAB: Environment with Approval Gates

pipelines/cab-gate.yaml

# Deployment stage with codified CAB approval
- stage: ProductionDeploy
  displayName: 'Production Deployment (CAB Gated)'
  dependsOn: StagingValidation
  jobs:
    - deployment: CabApproval
      # Environment 'aks-production' configured with:
      # - Approval: service-owners group + platform-team
      # - Branch control: only from refs/heads/main
      # - Business hours: Mon-Fri 06:00-18:00 CST
      # - Exclusive lock: one deployment at a time
      environment: 'aks-production'
      strategy:
        runOnce:
          deploy:
            steps:
              - script: |
                  if [ "$(riskLevel)" = "low" ]; then
                    echo "Low-risk change — auto-approved by codified CAB"
                  fi
                displayName: 'CAB decision log'
              - task: HelmDeploy@0
                inputs:
                  connectionType: 'Kubernetes Service Connection'
                  kubernetesServiceConnection: 'aks-production'
                  namespace: '$(team)'
                  command: 'upgrade'
                  chartType: 'FilePath'
                  chartPath: 'charts/$(service)'
                  releaseName: '$(service)'
                  overrideValues: 'image.tag=$(Build.BuildId)'

Glossary

Term	Definition
Branch Policy	Azure DevOps Repos configuration enforcing rules on branch operations (merge requirements, build validation).
Build Validation	A required pipeline run triggered on every PR that must succeed before merge is permitted.
Required Reviewer	An auto-assigned reviewer based on file path patterns, ensuring domain experts review relevant changes.
Status Check	An external validation (SonarQube, security scan) that reports pass/fail to the PR.
Squash Merge	Combining all commits in a feature branch into a single commit on the target branch.
PR Template	A markdown template pre-populating PR descriptions with checklists and required fields.
CODEOWNERS	File path–based automatic reviewer assignment (Azure DevOps uses "Required Reviewer" policies).
Codified CAB	An automated Change Advisory Board that uses pipeline gates and risk scores instead of manual meetings.
Change Risk Assessment	Automated scoring of a change's risk level based on file paths, blast radius, and service tier.
Blast Radius	The number of downstream services or users affected if the change introduces a defect.
Change Record	A formal record in ServiceNow/ITSM documenting the change, its risk, and approval history.
Service Hook	Azure DevOps webhook that sends event notifications to external services (ServiceNow, Slack).
Quality Gate	A set of measurable thresholds (coverage, complexity) that code must meet to proceed.
Technical Debt Score	A SonarQube metric estimating the effort required to fix all code maintainability issues.

Component: PRWorkflowBoard

Screen Layout: Kanban-style board with columns: Draft → In Review → Changes Requested → Approved → Merged. Each PR card shows title, author avatar, reviewer status (pending/approved/rejected), quality gate results (SonarQube, security scan), and age indicator. Cards have colored borders based on risk level (green=low, yellow=medium, red=high).

Component: CABDashboard

Screen Layout: Queue of pending change requests with columns: service, change summary, risk score (color-coded), requested by, approval status, and scheduled deployment window. History section showing past approvals with auto-approve rate percentage. Pie chart breakdown: auto-approved vs. manual approval vs. rejected.

Mock Data

mock-data/pr-workflow.json

{
  "pull_requests": [
    { "id": 1234, "title": "Add passkey authentication flow", "author": "jsmith", "status": "in_review", "reviewers": [{"name": "adoe", "status": "approved"}, {"name": "mchen", "status": "pending"}], "risk_score": 65, "risk_level": "medium", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "82%"} },
    { "id": 1235, "title": "Update Helm chart resource limits", "author": "kpatel", "status": "approved", "reviewers": [{"name": "platform-team", "status": "approved"}], "risk_score": 45, "risk_level": "medium", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "N/A"} },
    { "id": 1236, "title": "Fix checkout total calculation", "author": "adoe", "status": "changes_requested", "reviewers": [{"name": "jsmith", "status": "changes_requested"}], "risk_score": 55, "risk_level": "medium", "quality_gates": {"sonarqube": "failed", "security": "passed", "coverage": "75%"} },
    { "id": 1237, "title": "Update README badges", "author": "rlee", "status": "merged", "reviewers": [{"name": "auto-approve", "status": "approved"}], "risk_score": 5, "risk_level": "low", "quality_gates": {"sonarqube": "passed", "security": "passed", "coverage": "N/A"} }
  ],
  "cab_requests": [
    { "id": "CHG-2026-101", "service": "checkout-service", "summary": "Deploy passkey authentication", "risk_score": 65, "status": "pending", "approvers": ["service-owners", "security-team"] },
    { "id": "CHG-2026-100", "service": "catalog-api", "summary": "Enable vector search ranking", "risk_score": 35, "status": "auto-approved", "approvers": ["auto-cab"] },
    { "id": "CHG-2026-099", "service": "order-service", "summary": "Database schema migration v42", "risk_score": 85, "status": "approved", "approvers": ["service-owners", "platform-team", "dba-team"] },
    { "id": "CHG-2026-098", "service": "notification-svc", "summary": "Update email templates", "risk_score": 15, "status": "auto-approved", "approvers": ["auto-cab"] },
    { "id": "CHG-2026-097", "service": "identity-service", "summary": "Rotate JWT signing keys", "risk_score": 90, "status": "pending", "approvers": ["service-owners", "security-team", "platform-team"] }
  ]
}

04 Deployment Compliance Gates

Pre-Deployment Compliance Checks Required

Before any deployment to staging or production, the pipeline evaluates a set of compliance gates. All gates must pass — a single failure blocks the deployment.

Deployment Blocking Criteria

Critical/High vulnerabilities: Defender for DevOps or container scan results with Critical or High CVEs block deployment
Azure Policy violations: any resource with deny effect policy failure
Missing required tags: cost-center, owner, environment, data-classification
Dependency SLA: if a critical dependency service is in degraded state, block deployment
Change window violation: no deployments during blackout periods (weekends, holidays, 10pm–6am CST)
Test coverage: unit test coverage below 80% on changed files
Missing runbook: Tier-1 services must have an updated runbook

Azure Policy for AKS Required

Azure Policy for Kubernetes enforces guardrails on AKS workloads:

No privileged containers (deny)
Resource limits required on all containers (deny)
Only images from approved ACR (deny)
Required labels: app.kubernetes.io/name, acme.com/team (deny)
No host network or host PID (deny)
ReadOnlyRootFilesystem recommended (audit)

Defender for Cloud Integration

Defender for Cloud security recommendations must be addressed before promotion to production. The Secure Score must remain above the team's minimum threshold (default: 80/100).

Compliance as Code Recommended

All Azure Policy definitions stored as Terraform in Git. Changes to policies follow the same PR review process as application code. Policy assignments versioned and deployed via pipeline.

Deployment Windows & Blackout Periods

Window	Schedule	Allowed
Standard	Mon–Fri 6:00–18:00 CST	All deployments
Extended	Mon–Fri 18:00–22:00 CST	Tier-2/3 only, with approval
Blackout	Weekends, holidays, 22:00–6:00	Emergency hotfixes only (P1)

Post-Deployment Validation

Smoke tests: automated HTTP health checks against deployed endpoints
Synthetic monitoring: Application Insights availability tests for critical user flows
Automatic rollback: if smoke tests fail, pipeline triggers Helm rollback
Soak period: 30-minute monitoring window before declaring deployment successful

Audit Trail Required

Every deployment produces an immutable audit record in Azure DevOps + Log Analytics containing: who approved, what changed, when deployed, why (linked work item), risk score, and compliance gate results.

Azure Policy: Deny Untagged Resources

policies/require-tags.tf

# policies/require-tags.tf
resource "azurerm_policy_definition" "require_tags" {
  name         = "require-mandatory-tags"
  policy_type  = "Custom"
  mode         = "Indexed"
  display_name = "Require mandatory resource tags"
  description  = "Denies resource creation without required tags"

  policy_rule = jsonencode({
    if = {
      anyOf = [
        { field = "tags['cost-center']", exists = false },
        { field = "tags['owner']", exists = false },
        { field = "tags['environment']", exists = false },
        { field = "tags['data-classification']", exists = false }
      ]
    }
    then = { effect = "deny" }
  })
}

resource "azurerm_policy_definition" "aks_no_privileged" {
  name         = "aks-deny-privileged-containers"
  policy_type  = "Custom"
  mode         = "Microsoft.Kubernetes.Data"
  display_name = "AKS: Deny privileged containers"

  policy_rule = jsonencode({
    if = {
      field  = "type"
      equals = "Microsoft.ContainerService/managedClusters"
    }
    then = {
      effect = "deny"
      details = {
        templateInfo = {
          sourceType = "PublicURL"
          url        = "https://store.policy.core.windows.net/kubernetes/container-no-privilege/v2/template.yaml"
        }
        constraint = "https://store.policy.core.windows.net/kubernetes/container-no-privilege/v2/constraint.yaml"
      }
    }
  })
}

Azure Policy Initiative: Deployment Readiness

policies/deployment-readiness-initiative.tf

# policies/deployment-readiness-initiative.tf
resource "azurerm_policy_set_definition" "deployment_readiness" {
  name         = "deployment-readiness-v1"
  policy_type  = "Custom"
  display_name = "Deployment Readiness Initiative"
  description  = "All policies that must pass before production deployment"

  policy_definition_reference {
    policy_definition_id = azurerm_policy_definition.require_tags.id
    reference_id         = "requireTags"
  }

  policy_definition_reference {
    policy_definition_id = azurerm_policy_definition.aks_no_privileged.id
    reference_id         = "aksNoPrivileged"
  }

  policy_definition_reference {
    policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/febd0533-8e55-448f-b837-bd0e06f16469"
    reference_id         = "aksResourceLimits"
  }

  policy_definition_reference {
    policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/d2d3ab89-5a4e-4921-81fa-3c2c0a69fc6b"
    reference_id         = "aksApprovedImages"
  }
}

resource "azurerm_subscription_policy_assignment" "readiness" {
  name                 = "deploy-readiness"
  subscription_id      = data.azurerm_subscription.current.id
  policy_definition_id = azurerm_policy_set_definition.deployment_readiness.id
  display_name         = "Deployment Readiness Checks"
  enforce              = true
}

Pre-Deployment Compliance Check Pipeline

pipelines/compliance-gate.yaml

# pipelines/compliance-gate.yaml
stages:
  - stage: ComplianceGate
    displayName: 'Pre-Deployment Compliance'
    jobs:
      - job: PolicyCheck
        displayName: 'Azure Policy Compliance'
        steps:
          - task: AzureCLI@2
            displayName: 'Check policy compliance'
            inputs:
              azureSubscription: 'azure-service-connection'
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              inlineScript: |
                # Get non-compliant resources in target resource group
                NON_COMPLIANT=$(az policy state summarize \
                  --resource-group "rg-$(environment)-$(service)" \
                  --query "results[0].nonCompliantResources" -o tsv)
                if [ "$NON_COMPLIANT" -gt 0 ]; then
                  echo "##vso[task.logissue type=error]$NON_COMPLIANT non-compliant resources found"
                  az policy state list \
                    --resource-group "rg-$(environment)-$(service)" \
                    --filter "complianceState eq 'NonCompliant'" \
                    --query "[].{resource:resourceId, policy:policyDefinitionName}" -o table
                  exit 1
                fi
                echo "All resources compliant"

      - job: VulnerabilityCheck
        displayName: 'Defender Vulnerability Check'
        steps:
          - task: AzureCLI@2
            displayName: 'Check Defender findings'
            inputs:
              azureSubscription: 'azure-service-connection'
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              inlineScript: |
                CRITICAL=$(az security sub-assessment list \
                  --assessed-resource-id "/subscriptions/$(subscriptionId)" \
                  --query "length([?status.severity=='High' || status.severity=='Critical'])" -o tsv)
                if [ "$CRITICAL" -gt 0 ]; then
                  echo "##vso[task.logissue type=error]$CRITICAL critical/high vulnerabilities found"
                  exit 1
                fi

      - job: BlackoutCheck
        displayName: 'Deployment Window Check'
        steps:
          - script: |
              HOUR=$(TZ="America/Chicago" date +%H)
              DAY=$(TZ="America/Chicago" date +%u)
              if [ "$DAY" -gt 5 ] || [ "$HOUR" -lt 6 ] || [ "$HOUR" -ge 22 ]; then
                echo "##vso[task.logissue type=error]Deployment blocked: outside deployment window"
                echo "Current time: $(TZ='America/Chicago' date)"
                echo "Allowed: Mon-Fri 06:00-22:00 CST"
                exit 1
              fi
              echo "Within deployment window"
            displayName: 'Check deployment window'

Post-Deployment Smoke Test & Rollback

pipelines/post-deploy-validation.yaml

# pipelines/post-deploy-validation.yaml
- stage: SmokeTest
  displayName: 'Post-Deployment Validation'
  dependsOn: ProductionDeploy
  jobs:
    - job: SmokeTests
      steps:
        - script: |
            echo "Running smoke tests against $(service).$(domain)"
            for endpoint in /health /ready /api/status; do
              STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
                "https://$(service).$(domain)${endpoint}")
              if [ "$STATUS" != "200" ]; then
                echo "##vso[task.logissue type=error]Smoke test failed: ${endpoint} returned $STATUS"
                exit 1
              fi
              echo "✓ ${endpoint}: $STATUS"
            done
          displayName: 'HTTP smoke tests'

        - script: |
            echo "Soak period: monitoring for 30 minutes..."
            sleep 1800
            # Check Application Insights for error spike
            ERROR_RATE=$(az monitor app-insights query \
              --app "$(appInsightsName)" \
              --analytics-query "requests | where timestamp > ago(30m) | summarize errorRate=todouble(countif(success==false))/count()" \
              --query "tables[0].rows[0][0]" -o tsv)
            if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
              echo "##vso[task.logissue type=error]Error rate $ERROR_RATE exceeds threshold"
              exit 1
            fi
          displayName: 'Soak period monitoring'

- stage: AutoRollback
  displayName: 'Auto-Rollback on Failure'
  dependsOn: SmokeTest
  condition: failed()
  jobs:
    - deployment: Rollback
      environment: 'aks-production'
      strategy:
        runOnce:
          deploy:
            steps:
              - task: HelmDeploy@0
                displayName: 'Helm rollback'
                inputs:
                  connectionType: 'Kubernetes Service Connection'
                  kubernetesServiceConnection: 'aks-production'
                  namespace: '$(team)'
                  command: 'rollback'
                  releaseName: '$(service)'
                  arguments: '--wait --timeout 5m'
              - script: |
                  echo "##vso[task.logissue type=warning]ROLLBACK EXECUTED for $(service)"
                  # Notify Teams channel
                  curl -H 'Content-Type: application/json' \
                    -d '{"text":"⚠️ Auto-rollback executed for $(service) in production"}' \
                    "$(teamsWebhookUrl)"
                displayName: 'Notify rollback'

Log Analytics: Deployment Audit Trail (KQL)

queries/deployment-audit.kql

// Deployment Audit Trail — KQL Query
// Run in Log Analytics workspace
DeploymentAudit_CL
| where TimeGenerated > ago(30d)
| project
    TimeGenerated,
    Service = service_s,
    Environment = environment_s,
    Version = version_s,
    DeployedBy = deployed_by_s,
    ApprovedBy = approved_by_s,
    RiskScore = risk_score_d,
    RiskLevel = risk_level_s,
    PipelineRunId = pipeline_run_id_s,
    WorkItemId = work_item_id_s,
    ComplianceGates = compliance_gates_s,
    Duration = duration_s,
    Status = status_s
| order by TimeGenerated desc

// Compliance Gate Summary — last 7 days
DeploymentAudit_CL
| where TimeGenerated > ago(7d)
| extend Gates = parse_json(compliance_gates_s)
| mv-expand Gate = Gates
| summarize
    PassCount = countif(Gate.status == "passed"),
    FailCount = countif(Gate.status == "failed")
    by GateName = tostring(Gate.name)
| extend PassRate = round(todouble(PassCount) / (PassCount + FailCount) * 100, 1)
| order by PassRate asc

Azure Monitor: Deployment Health Alert

modules/monitoring/deployment-alert.tf

# modules/monitoring/deployment-alert.tf
resource "azurerm_monitor_scheduled_query_rules_alert_v2" "deploy_health" {
  name                = "alert-deployment-error-spike-${var.service_name}"
  resource_group_name = var.resource_group_name
  location            = var.location
  description         = "Error rate spike detected after deployment"
  severity            = 1
  enabled             = true

  scopes              = [var.application_insights_id]
  evaluation_frequency = "PT5M"
  window_duration      = "PT15M"

  criteria {
    query = <<-KQL
      requests
      | where timestamp > ago(15m)
      | summarize
          errorRate = todouble(countif(success == false)) / count(),
          totalRequests = count()
      | where errorRate > 0.01 and totalRequests > 100
    KQL
    time_aggregation_method = "Count"
    operator                = "GreaterThan"
    threshold               = 0
  }

  action {
    action_groups = [var.action_group_id]
  }

  tags = var.common_tags
}

Glossary

Term	Definition
Compliance Gate	An automated check that must pass before a deployment is permitted to proceed.
Azure Policy	Azure-native governance service that enforces organizational rules on resources at scale.
Policy Initiative	A collection of Azure Policy definitions grouped and assigned together as a single unit.
Deny Effect	Azure Policy effect that blocks resource creation or update if the rule is violated.
Audit Effect	Azure Policy effect that flags non-compliance without blocking the operation.
Defender for Cloud	Unified security posture management and threat protection across Azure resources.
Secure Score	A Defender for Cloud metric (0–100) representing the security posture of your subscriptions.
Deployment Readiness	The state where all pre-deployment gates (tests, scans, policies, approvals) have passed.
Blackout Window	A time period during which production deployments are prohibited (weekends, holidays, late night).
Smoke Test	Minimal health-check tests run immediately after deployment to validate basic functionality.
Synthetic Monitor	Application Insights availability tests that simulate user flows at regular intervals.
Rollback Trigger	An automated condition (error rate spike, smoke test failure) that initiates deployment rollback.
Change Window	The approved time range during which deployments are permitted.
Data Classification Tag	A required resource tag indicating the sensitivity level of data (public, internal, confidential, restricted).
Immutable Audit Log	A tamper-proof record of all deployment actions stored in Log Analytics with retention policies.
KQL	Kusto Query Language — the query language used in Azure Log Analytics and Application Insights.

Component: ComplianceGateway

Screen Layout: A vertical checklist for a specific deployment, with each gate showing: gate name, status (pass/fail/pending), details, and timestamp. Gates include: Azure Policy compliance, vulnerability scan, tag validation, deployment window check, test coverage, runbook verification, and approval status. Overall readiness indicator (green checkmark or red block) at the top.

Component: PolicyDashboard

Screen Layout: Azure Policy compliance overview with compliance percentage per resource group. Drill-down table showing individual policy violations with resource name, policy name, effect, and remediation guidance. Filter by subscription, resource group, and policy category.

Component: DeploymentAuditLog

Screen Layout: Searchable, sortable table of all deployments with columns: timestamp, service, environment, version, deployed by, risk score, compliance gates summary, duration, and status. Filter by service, environment, date range, and status. Export to CSV.

Mock Data

mock-data/compliance-gates.json

{
  "services": [
    { "name": "checkout-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 100, "secure_score": 92, "deployment_ready": true },
    { "name": "catalog-api", "policy_compliance": 95, "vuln_count": 2, "tag_compliance": 100, "secure_score": 85, "deployment_ready": false },
    { "name": "order-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 95, "secure_score": 88, "deployment_ready": false },
    { "name": "identity-service", "policy_compliance": 100, "vuln_count": 0, "tag_compliance": 100, "secure_score": 95, "deployment_ready": true },
    { "name": "notification-svc", "policy_compliance": 100, "vuln_count": 1, "tag_compliance": 100, "secure_score": 90, "deployment_ready": true },
    { "name": "payment-processor", "policy_compliance": 98, "vuln_count": 0, "tag_compliance": 100, "secure_score": 91, "deployment_ready": true }
  ],
  "audit_log": [
    { "timestamp": "2026-02-28T14:32:00Z", "service": "checkout-service", "env": "production", "version": "3.4.1", "deployed_by": "jsmith", "risk_score": 35, "gates": "6/6 passed", "duration": "14m", "status": "succeeded" },
    { "timestamp": "2026-02-28T11:15:00Z", "service": "catalog-api", "env": "staging", "version": "2.8.0", "deployed_by": "adoe", "risk_score": 50, "gates": "5/6 passed", "duration": "11m", "status": "failed" },
    { "timestamp": "2026-02-27T16:45:00Z", "service": "order-service", "env": "production", "version": "4.1.2", "deployed_by": "mchen", "risk_score": 25, "gates": "6/6 passed", "duration": "16m", "status": "succeeded" },
    { "timestamp": "2026-02-27T10:20:00Z", "service": "identity-service", "env": "production", "version": "1.9.0", "deployed_by": "kpatel", "risk_score": 70, "gates": "6/6 passed", "duration": "22m", "status": "succeeded" },
    { "timestamp": "2026-02-26T15:00:00Z", "service": "notification-svc", "env": "production", "version": "2.3.5", "deployed_by": "rlee", "risk_score": 15, "gates": "6/6 passed", "duration": "9m", "status": "succeeded" }
  ]
}

Unified Demo Application Specification

Acme Feature Console (Azure)

Tech Stack: React 18, TypeScript, Tailwind CSS, Recharts, Mock API layer

The unified demo application brings together all four modules into a single dashboard experience for managing the complete feature lifecycle.

Key Screens

Feature Flag Dashboard — All flags with state, targeting rules, rollout percentages, and TTL tracking
Canary Monitor — Real-time canary deployment health, traffic weight, and success rate metrics
Pipeline Dashboard — Pipeline runs, stage status, success rates, and deployment frequency
PR Workflow Board — Kanban view of PRs with quality gates and review status
CAB Dashboard — Change requests with risk scores, auto-approval history, and approval queue
Compliance Gateway — Pre-deployment checklist with gate pass/fail status
Deployment Audit Log — Searchable history of all deployments with compliance data

Architecture Overview

architecture/component-tree.ts

// Component Tree
App
├── Layout
│   ├── TopBar (theme toggle, user menu)
│   └── Sidebar (navigation)
├── Pages
│   ├── FeatureFlagDashboard
│   │   ├── FlagTable (sortable, filterable)
│   │   ├── FlagDetail (targeting rules, audit log)
│   │   └── CreateFlagModal
│   ├── CanaryMonitor
│   │   ├── CanaryCard (per deployment)
│   │   ├── MetricsChart (Recharts)
│   │   └── RollbackButton
│   ├── PipelineDashboard
│   │   ├── PipelineRunTable
│   │   ├── StageIndicator
│   │   └── DeploymentTimeline
│   ├── PRWorkflowBoard
│   │   ├── KanbanColumn (per status)
│   │   ├── PRCard
│   │   └── QualityGateBadges
│   ├── CABDashboard
│   │   ├── ChangeRequestQueue
│   │   ├── RiskScoreBadge
│   │   └── ApprovalHistory
│   ├── ComplianceGateway
│   │   ├── GateChecklist
│   │   ├── PolicyDashboard
│   │   └── ReadinessIndicator
│   └── AuditLog
│       ├── AuditTable (searchable)
│       └── ExportButton
└── Services
    ├── apiClient.ts (mock fetch layer)
    ├── flagService.ts
    ├── pipelineService.ts
    └── complianceService.ts

Data Model

types/data-model.ts

// Core Entities
FeatureFlag: { name, status, rollout_pct, env, targeting, ttl_days_remaining, created, team }
CanaryDeployment: { service, phase, weight_pct, success_rate, p99_latency, started_at }
PipelineRun: { id, pipeline, trigger, branch, stages, duration_min, actor, started_at }
PullRequest: { id, title, author, status, reviewers[], risk_score, quality_gates }
ChangeRequest: { id, service, summary, risk_score, risk_level, status, approvers[] }
ComplianceGate: { name, status, details, timestamp }
PolicyViolation: { resource_id, policy_name, effect, severity, detected_at }
DeploymentAudit: { timestamp, service, env, version, deployed_by, risk_score, gates, status }
ServiceHealth: { name, policy_compliance, vuln_count, secure_score, deployment_ready }

Full API Schema

Method	Endpoint	Module	Description
`GET`	`/api/flags`	M1	List all feature flags with state and targeting
`POST`	`/api/flags`	M1	Create a new feature flag
`PUT`	`/api/flags/{name}`	M1	Update flag state or targeting rules
`DELETE`	`/api/flags/{name}`	M1	Retire and remove a feature flag
`GET`	`/api/canary/{service}`	M1	Get canary deployment status
`POST`	`/api/canary/{service}/rollback`	M1	Trigger canary rollback
`GET`	`/api/pipelines/runs`	M2	List pipeline runs with stage status
`GET`	`/api/pipelines/runs/{id}`	M2	Get detailed pipeline run info
`GET`	`/api/pipelines/metrics`	M2	Pipeline success rate, deployment frequency
`POST`	`/api/pipelines/trigger`	M2	Manually trigger a pipeline run
`GET`	`/api/prs`	M3	List PRs with review and quality gate status
`GET`	`/api/prs/{id}/risk`	M3	Get risk assessment for a PR
`GET`	`/api/cab/queue`	M3	List pending CAB change requests
`POST`	`/api/cab/{id}/approve`	M3	Approve a change request
`GET`	`/api/compliance/readiness/{service}`	M4	Get deployment readiness checklist
`GET`	`/api/compliance/policies`	M4	List policy compliance per resource group
`GET`	`/api/compliance/violations`	M4	List active policy violations
`GET`	`/api/audit/deployments`	M4	Query deployment audit trail
`GET`	`/api/services`	All	List all services with health overview
`GET`	`/api/services/{name}/health`	All	Detailed service health and compliance

Azure → AWS Service Mapping

Capability	Azure Version	AWS Equivalent
Feature Flags	Azure App Configuration Feature Manager	AWS AppConfig Feature Flags
CI/CD Pipelines	Azure DevOps Pipelines (YAML)	AWS CodePipeline / GitHub Actions
Container Registry	Azure Container Registry (ACR)	Amazon ECR
Kubernetes	Azure Kubernetes Service (AKS)	Amazon EKS
Policy Engine	Azure Policy	AWS Config Rules / SCPs
Security Posture	Microsoft Defender for Cloud	AWS Security Hub
Secrets Management	Azure Key Vault	AWS Secrets Manager
Monitoring / APM	Application Insights + Azure Monitor	CloudWatch + X-Ray
Traffic Management	Azure Traffic Manager	Route 53 weighted routing
Service Mesh	Istio on AKS / Azure Service Mesh	AWS App Mesh
Artifact Repository	Azure Artifacts	AWS CodeArtifact
Work Item Tracking	Azure Boards	Jira / AWS CodeCatalyst
Code Repository	Azure DevOps Repos	AWS CodeCommit / GitHub
Log Analytics	Azure Log Analytics (KQL)	CloudWatch Logs Insights