EC2 Blue/Green CloudFormation Deployments without DNS changes

Published on 10 February 2021

Introduction

Below is a CloudFormation script to perform Blue/Green deployments using EC2 resources. Blue/Green deployments basically mean that instead of deploying to staging and then having some kind of rolling deployment into production, you deploy to one of two environments (blue or green), test against it, and when testing is complete, switch over and kill the old environment, meaning your test environment is always like production, and you are not paying for a environment to sit there only for use when testing.

This scsript has been created to allow for fast failover and failback without having to wait for DNS propogation, or EC2 instance replacement. So instead of rotating instances in and out of a single target group, the target group that the live load balancer talks to is changed. This allows for a near instant switch between environments, and in the event of a problem, near instant failback. In practice this seems to take around a second or so to actually switch.

The non-live environment is created and deleted based on what phase of the deployment we are in. There is also the staging load balancer that is connected to the non-live environment while it exists.

The phase is read by a variable in the SSM, that dicates which phase of the deployment you are in. The idea is that you can run a CloudFormation update without using multiple templates, you just change the SSM parameter each time. Alternatively you could pass it in on the command line. All resources are also managed by CloudFormation.

There is an output defined with the DNS name of the staging load balancer, for those phases were it exists. The CloudFormation update can be triggered from various tools, inlcuding Azure DevOps.

Phases

The script is designed around 8 phases.

  1. Blue is live. No Green or staging resources created.
  2. Blue is live. Green is spun up for staging, alongside a staging load balancer that points to the Green environment.
  3. Blue is live. When testing is complete in Green, the staging load balancer is removed in preperation for pointing live to Green.
  4. Green is live. The live load balancer listeners are repointed to the Green target groups. The Blue environment is still kept up in case of any issues following the switch over.
  5. Green is live. The Blue environment is removed.
  6. Green is live. Blue is spun up for staging, alongside a staging load balancer that points to the Blue environment.
  7. Green is live. When testing is complete in Blue, the staging load balancer is removed in preperation for pointing live to Blue.
  8. Blue is live. The live load balancer listeners are repointed to the Blue target groups. The Green environment is still kept up in case of any issues following the switch over.

The next step would be to go around to Phase 1.

If issues are found following the switch, you can move backwards to revert to the previous environment.

The idea is that something like Azure DevOps could also be used to update the SSM variable and run the CloudFormation script as part of the pipeline. This was created because CodeDeploy doesn't have enough options current around the time it takes for instances to be ready.

The code is here:

    # Blue/Green Cloudformation template
    # Greg Heywood
    # V1.0
    # 14/11/2020


    AWSTemplateFormatVersion: 2010-09-09
    Description: Blue/Green stack
    Parameters:
    SourceCodeFile:
        Type: String
        Description: Name of zip file that has been uploaded to the code repository
    BuildPhase:
        Type: 'AWS::SSM::Parameter::Value<String>'
        Default: BlueGreenPhase
        Description: This variable will determine the current phase that the environment is in - taken from SSM Parameter - BlueGreenPhase
    SourceAMI:
        Type: 'AWS::SSM::Parameter::Value<String>'
        Default: BlueGreenAMI
        Description: AMI Id - taken from SSM Parameter - BlueGreenAMI
    VpcId:
        Type: 'AWS::EC2::VPC::Id'
        Description: VpcId of your existing Virtual Private Cloud (VPC)
    Subnets:
        Type: 'List<AWS::EC2::Subnet::Id>'
        Description: The list of SubnetIds in your Virtual Private Cloud (VPC)
    LTSecurityGroups:
        Type: 'List<String>'  
        Description: The list of Security Groups used by the Launch Template
    KeyName:
        Description: Name of an existing EC2 KeyPair for access to the instances
        Type: 'AWS::EC2::KeyPair::KeyName'
    ASGMinCapacity:
        Type: Number
        Default: 1
        Description: The minimum capacity for the AutoScaling Group
    ASGMaxCapacity:
        Type: Number
        Default: 2
        Description: The maximum capacity for the AutoScaling Group
    ASGDesiredCapacity:
        Type: Number
        Default: 1
        Description: The desired capacity for the AutoScaling Group
    InstanceType:
        Type: String
        Default: 'm5.4xlarge'
        Description: Webserver instance type
    LTIamInstanceProfile:
        Type: String
        Description: Launch Configuration IAM Instance Profile
    SNSTopicForAutoScalingGroup:
        Type: String
        Description: SNS Notifications

    Conditions:
    Phase2:
        !Equals [!Ref BuildPhase, Phase2]
    Phase6:
        !Equals [!Ref BuildPhase, Phase6]
    CreateStagingNLB:
        !Or [!Equals [!Ref BuildPhase, Phase2], !Equals [!Ref BuildPhase, Phase6]]
    CreateBlueASG:
        !Or [!Equals [!Ref BuildPhase, Phase1], !Equals [!Ref BuildPhase, Phase2], !Equals [!Ref BuildPhase, Phase3], !Equals [!Ref BuildPhase, Phase4], !Equals [!Ref BuildPhase, Phase6], !Equals [!Ref BuildPhase, Phase7], !Equals [!Ref BuildPhase, Phase7]]
    CreateGreenASG:
        !Or [!Equals [!Ref BuildPhase, Phase2], !Equals [!Ref BuildPhase, Phase3], !Equals [!Ref BuildPhase, Phase4],!Equals [!Ref BuildPhase, Phase5], !Equals [!Ref BuildPhase, Phase6], !Equals [!Ref BuildPhase, Phase7], !Equals [!Ref BuildPhase, Phase7]]
    BlueListeners:
        !Or [!Equals [!Ref BuildPhase, Phase1], !Equals [!Ref BuildPhase, Phase2], !Equals [!Ref BuildPhase, Phase3], !Equals [!Ref BuildPhase, Phase8]]
    GreenListeners:
        !Or [!Equals [!Ref BuildPhase, Phase4], !Equals [!Ref BuildPhase, Phase5], !Equals [!Ref BuildPhase, Phase6], !Equals [!Ref BuildPhase, Phase7]]

    Resources:
    #### Shared Resources ####
    # Live-NLB 
    LiveElasticLoadBalancer:
        Type: AWS::ElasticLoadBalancingV2::LoadBalancer
        Properties: 
        IpAddressType: ipv4
        Name: "BlueGreen-Live-NLB"
        Scheme: internal
        Subnets: !Ref Subnets
        LoadBalancerAttributes:
            - Key: load_balancing.cross_zone.enabled
            Value: true
        Tags: 
            - Key: "Name"
            Value: "BlueGreen-Live-NLB"
            - Key: 'CreatedBy'
            Value: !Ref 'AWS::StackName'
        Type: network

    # Prod Listener 443
    ProdLBTargetGroupListener443:
        Type: AWS::ElasticLoadBalancingV2::Listener
        Properties:
        DefaultActions:
        - Type: forward
            TargetGroupArn: !If [BlueListeners, !Ref BlueLoadBalancerTargetGroup443, !Ref GreenLoadBalancerTargetGroup443]
        LoadBalancerArn: !Ref LiveElasticLoadBalancer
        Port: 443
        Protocol: TCP
    # Prod Listener 80
    ProdLBTargetGroupListener80:
        Type: AWS::ElasticLoadBalancingV2::Listener
        Properties:
        DefaultActions:
        - Type: forward
            TargetGroupArn: !If [BlueListeners, !Ref BlueLoadBalancerTargetGroup80, !Ref GreenLoadBalancerTargetGroup80]
        LoadBalancerArn: !Ref LiveElasticLoadBalancer
        Port: 80
        Protocol: TCP
    # Staging NLB
    StagingElasticLoadBalancer:
        Type: AWS::ElasticLoadBalancingV2::LoadBalancer
        Condition: CreateStagingNLB
        Properties: 
        IpAddressType: ipv4
        Name: "BlueGreen-Staging-NLB"
        Scheme: internal
        Subnets: !Ref Subnets
        LoadBalancerAttributes:
            - Key: load_balancing.cross_zone.enabled
            Value: true
        Tags: 
            - Key: "Name"
            Value: "BlueGreen-Staging-NLB"
            - Key: 'CreatedBy'
            Value: !Ref 'AWS::StackName'
        Type: network
    # Staging Listener 443
    StagingLBTargetGroupListener443:
        Type: AWS::ElasticLoadBalancingV2::Listener
        Condition: CreateStagingNLB
        Properties:
        DefaultActions:
        - Type: forward
            TargetGroupArn: !If [Phase6, !Ref BlueLoadBalancerTargetGroup443, !If [Phase2, !Ref GreenLoadBalancerTargetGroup443, !Ref "AWS::NoValue" ]]
        LoadBalancerArn: !Ref StagingElasticLoadBalancer
        Port: 443
        Protocol: TCP
    # Staging Listener 80
    StagingLBTargetGroupListener80:
        Type: AWS::ElasticLoadBalancingV2::Listener
        Condition: CreateStagingNLB
        Properties:
        DefaultActions:
        - Type: forward
            TargetGroupArn: !If [Phase6, !Ref BlueLoadBalancerTargetGroup80, !If [Phase2, !Ref GreenLoadBalancerTargetGroup80, !Ref "AWS::NoValue" ]]
        LoadBalancerArn: !Ref StagingElasticLoadBalancer
        Port: 80
        Protocol: TCP


    #### Blue Resources ####
    # Blue TG 443
    BlueLoadBalancerTargetGroup443:
        Type: AWS::ElasticLoadBalancingV2::TargetGroup
        Condition: CreateBlueASG
        Properties:
        Name: BlueGreen-TG443-Blue
        Port: 443
        Protocol: TCP
        TargetType: instance
        VpcId: !Ref VpcId
        TargetGroupAttributes:
            - Key: deregistration_delay.timeout_seconds
            Value: 60  
        Tags:
            - Key: Name
            Value: BlueGreen-TG443-Blue
            - Key: 'BuildPhase'
            Value: !Ref BuildPhase
    # Blue TG 80
    BlueLoadBalancerTargetGroup80:
        Type: AWS::ElasticLoadBalancingV2::TargetGroup
        Condition: CreateBlueASG
        Properties:
        Name: BlueGreen-TG80-Blue
        Port: 80
        Protocol: TCP
        TargetType: instance
        VpcId: !Ref VpcId
        TargetGroupAttributes:
            - Key: deregistration_delay.timeout_seconds
            Value: 60  
        Tags:
            - Key: Name
            Value: BlueGreen-TG80-Blue
            - Key: 'BuildPhase'
            Value: !Ref BuildPhase
    # Blue Launch Template
    BlueLaunchTemplate:
        Type: AWS::EC2::LaunchTemplate
        Condition: CreateBlueASG
        Properties:
        LaunchTemplateData:
            InstanceType: !Ref InstanceType
            EbsOptimized: 'true'
            DisableApiTermination: 'false'
            IamInstanceProfile:
            Name: !Ref LTIamInstanceProfile
            KeyName: !Ref KeyName
            ImageId: !Ref SourceAMI
            SecurityGroupIds: !Ref LTSecurityGroups
            Monitoring:
            Enabled: True
            UserData:
            Fn::Base64:
                Fn::Sub: |
                <powershell>
                Start-Transcript -Path "C:\userdata-transcript_$(Get-Date -format 'yyyy-MM-dd_HH-mm-ss').log"
                # Whatever userdata you normally use
                Stop-Transcript
                </powershell>
                <persist>true</persist>
        LaunchTemplateName: BlueGreen-LT-BLUE
    # Blue ASG (add TargetGroupARNs)
    BlueAutoScalingGroup:
        Type: AWS::AutoScaling::AutoScalingGroup
        Condition: CreateBlueASG
        Properties:
        AutoScalingGroupName: BlueGreen-ASG-Blue
        MinSize: !Ref ASGMinCapacity
        MaxSize: !Ref ASGMaxCapacity
        DesiredCapacity: !Ref ASGDesiredCapacity
        LaunchTemplate:
            LaunchTemplateName: BlueGreen-LT-BLUE
            Version: 1
        VPCZoneIdentifier: !Ref Subnets
        TargetGroupARNs: 
            - !Ref BlueLoadBalancerTargetGroup443
            - !Ref BlueLoadBalancerTargetGroup80
        TerminationPolicies:
            - Default
        MetricsCollection:
            - Granularity: "1Minute"
            Metrics:
            - 'GroupMaxSize'
            - 'GroupMinSize'
            - 'GroupStandbyInstances'
            - 'GroupInServiceInstances'
            - 'GroupPendingInstances'
            - 'GroupTotalInstances'
            - 'GroupDesiredCapacity'
            - 'GroupTerminatingInstances'
        NotificationConfigurations:
        - TopicARN: !Ref SNSTopicForAutoScalingGroup
            NotificationTypes:
            - autoscaling:EC2_INSTANCE_LAUNCH
            - autoscaling:EC2_INSTANCE_LAUNCH_ERROR
            - autoscaling:EC2_INSTANCE_TERMINATE
            - autoscaling:EC2_INSTANCE_TERMINATE_ERROR
        Tags: 
            - Key: "Name"
            PropagateAtLaunch: true
            Value: BlueGreen-ASG-Blue
            - Key: 'CreatedBy'
            PropagateAtLaunch: true
            Value: !Ref 'AWS::StackName'
            - Key: 'BuildPhase'
            PropagateAtLaunch: true
            Value: !Ref BuildPhase

    BlueASGScalingPolicy:
        Type: AWS::AutoScaling::ScalingPolicy
        Condition: CreateBlueASG
        Properties:
        AutoScalingGroupName: !Ref BlueAutoScalingGroup
        PolicyType: TargetTrackingScaling
        EstimatedInstanceWarmup: 1100
        TargetTrackingConfiguration:
            PredefinedMetricSpecification:
            PredefinedMetricType: ASGAverageCPUUtilization
            TargetValue: 50

    BlueASGLifeCycleHook:
        Type: AWS::AutoScaling::LifecycleHook
        Condition: CreateBlueASG
        Properties:
        AutoScalingGroupName: !Ref BlueAutoScalingGroup
        DefaultResult: CONTINUE
        HeartbeatTimeout: 600
        LifecycleHookName: RemoveFromDomainHook
        LifecycleTransition: autoscaling:EC2_INSTANCE_TERMINATING

    #### Green Resources ####
    # Green TG 443
    GreenLoadBalancerTargetGroup443:
        Type: AWS::ElasticLoadBalancingV2::TargetGroup
        Condition: CreateGreenASG
        Properties:
        Name: BlueGreen-TG443-Green
        Port: 443
        Protocol: TCP
        TargetType: instance
        VpcId: !Ref VpcId
        TargetGroupAttributes:
            - Key: deregistration_delay.timeout_seconds
            Value: 60  
        Tags:
            - Key: Name
            Value: BlueGreen-TG443-Green
            - Key: 'BuildPhase'
            Value: !Ref BuildPhase
    # Green TG 80
    GreenLoadBalancerTargetGroup80:
        Type: AWS::ElasticLoadBalancingV2::TargetGroup
        Condition: CreateGreenASG
        Properties:
        Name: BlueGreen-TG80-Green
        Port: 80
        Protocol: TCP
        TargetType: instance
        VpcId: !Ref VpcId
        TargetGroupAttributes:
            - Key: deregistration_delay.timeout_seconds
            Value: 60  
        Tags:
            - Key: Name
            Value: BlueGreen-TG443-Green
            - Key: 'BuildPhase'
            Value: !Ref BuildPhase
    # Green Launch Template
    GreenLaunchTemplate:
        Type: AWS::EC2::LaunchTemplate
        Condition: CreateGreenASG
        Properties:
        LaunchTemplateData:
            InstanceType: !Ref InstanceType
            EbsOptimized: 'true'
            DisableApiTermination: 'false'
            IamInstanceProfile:
            Name: !Ref LTIamInstanceProfile
            KeyName: !Ref KeyName
            ImageId: !Ref SourceAMI
            SecurityGroupIds: !Ref LTSecurityGroups
            Monitoring:
            Enabled: True
            UserData:
            Fn::Base64:
                Fn::Sub: |
                <powershell>
                Start-Transcript -Path "C:\userdata-transcript_$(Get-Date -format 'yyyy-MM-dd_HH-mm-ss').log"
                # Whatever userdata you normally use
                Stop-Transcript
                </powershell>
                <persist>true</persist>
        LaunchTemplateName: BlueGreen-LT-GREEN
    # Green ASG (add TargetGroupARNs)
    GreenAutoScalingGroup:
        Type: AWS::AutoScaling::AutoScalingGroup
        Condition: CreateGreenASG
        Properties:
        AutoScalingGroupName: BlueGreen-ASG-Green
        MinSize: !Ref ASGMinCapacity
        MaxSize: !Ref ASGMaxCapacity
        DesiredCapacity: !Ref ASGDesiredCapacity
        LaunchTemplate:
            LaunchTemplateName: BlueGreen-LT-GREEN
            Version: 1
        VPCZoneIdentifier: !Ref Subnets
        TargetGroupARNs:
            - !Ref GreenLoadBalancerTargetGroup443
            - !Ref GreenLoadBalancerTargetGroup80
        TerminationPolicies:
            - Default
        MetricsCollection:
            - Granularity: "1Minute"
            Metrics:
            - 'GroupMaxSize'
            - 'GroupMinSize'
            - 'GroupStandbyInstances'
            - 'GroupInServiceInstances'
            - 'GroupPendingInstances'
            - 'GroupTotalInstances'
            - 'GroupDesiredCapacity'
            - 'GroupTerminatingInstances'
        NotificationConfigurations:
        - TopicARN: !Ref SNSTopicForAutoScalingGroup
            NotificationTypes:
            - autoscaling:EC2_INSTANCE_LAUNCH
            - autoscaling:EC2_INSTANCE_LAUNCH_ERROR
            - autoscaling:EC2_INSTANCE_TERMINATE
            - autoscaling:EC2_INSTANCE_TERMINATE_ERROR
        Tags: 
            - Key: "Name"
            PropagateAtLaunch: true
            Value: BlueGreen-ASG-Green
            - Key: 'CreatedBy'
            PropagateAtLaunch: true
            Value: !Ref 'AWS::StackName'
            - Key: 'BuildPhase'
            PropagateAtLaunch: true
            Value: !Ref BuildPhase

    GreenASGScalingPolicy:
        Type: AWS::AutoScaling::ScalingPolicy
        Condition: CreateGreenASG
        Properties:
        AutoScalingGroupName: !Ref GreenAutoScalingGroup
        PolicyType: TargetTrackingScaling
        EstimatedInstanceWarmup: 1100
        TargetTrackingConfiguration:
            PredefinedMetricSpecification:
            PredefinedMetricType: ASGAverageCPUUtilization
            TargetValue: 50

    GreenASGLifeCycleHook:
        Type: AWS::AutoScaling::LifecycleHook
        Condition: CreateGreenASG
        Properties:
        AutoScalingGroupName: !Ref GreenAutoScalingGroup
        DefaultResult: CONTINUE
        HeartbeatTimeout: 600
        LifecycleHookName: RemoveFromDomainHook
        LifecycleTransition: autoscaling:EC2_INSTANCE_TERMINATING


    Outputs:
    BackupLoadBalancerDNSName:
        Description: The DNSName of the Staging Load Balancer
        Value: !GetAtt StagingElasticLoadBalancer.DNSName
        Condition: CreateStagingNLB

You can also find the code on here on GitHub.

comments powered by Disqus