Blue-green deployments and rolling deployments (canary tests) allow you to release new software gradually, and identify and mitigate the potential blast radius of a failed software release. This allows you to release new software with near-zero downtime.
In a blue-green deployment, the current service acts as the blue environment. When a new version of the service is available, you deploy the new service and underlying infrastructure into a new green environment. When the green environment is ready and tested, traffic is redirected from the blue environment to the green one. This process ensures that:
- Errors are identified when the green environment is tested. If there are any errors, traffic will continue to be routed towards the blue environment thus there is near-zero downtime.
- If errors are detected after the green environment has been promoted, rolling back to the previous version is straightforward — traffic is redirected back to the blue environment.
In addition to blue-green deployments, canary tests and rolling deployments release the new version of the service to a small group of users, thereby reducing the blast radius when the green environment fails.
Canary tests can be done with blue-green environments. After the green environment is ready, the load balancer sends a small fraction of the traffic to the green environment (in this example, 10%).
If no errors are identified during the canary tests, you incrementally direct traffic to the green environment (50/50 — split traffic). Finally, the load balancer will direct all traffic to the green environment. When you're comfortable, destroy the old, blue environment. The green environment is now the current production service.
![Rolling deployment. After the initial canary test, traffic to the green environment is split evenly with the blue environment (50/50). Finally, all traffic is directed to the green environment.][rolling-deployment]
AWS's application load balancer (ALB) automatically distributes incoming traffic to the appropriate service at the application layer. ALBs are different from classic load balancers which only route traffic to EC2 instances across multiple availability zones. You can define an ALB's listeners (rules) and target groups to dynamically route traffic to services. These rules enable you to run canary tests on and incrementally promote the green environment.
In this tutorial, you will do the following with Terraform:
- Provision underlying resources (VPC, security groups, load balancers) and a set of web servers to serve as the blue environment.
- Provision a second set of web servers to serve as the green environment.
- Add feature toggles to your Terraform configuration to define a list of potential deployment strategies.
- Use feature toggles to conduct a canary test and incrementally promote your green environment.
»Prerequisites
This tutorial assumes you are familiar with the standard Terraform workflow. If you are unfamiliar with Terraform, complete the Get Started tutorials first.
For this tutorial, you will need:
- Terraform 0.14+ installed locally
- an AWS account
»Explore the workspace
Clone the Learn Terraform Advanced Deployment Strategies repository.
Navigate to the repository directory in your terminal.
This directory is a Terraform workspace with multiple files:
main.tf
contains configuration to deploy this tutorial's VPC, security groups, and load balancersvariables.tf
defines variables used by the configuration such as region, CIDR blocks, number of subnets, etc.blue.tf
contains configuration to deploy 2 AWS instances and start a web server. These instances represent a sample application's "version 1.0".init-script.tf
contains the script to start the web server.versions.tf
contains aterraform
block which specifies the Terraform binary version and AWS provider version.terraform.lock.hcl
is the Terraform dependency lock file
Note: This workspace combines the network (load balancer) and the application configuration in one directory as a learning exercise. In a production environment, these configuration would be separated, using terraform_remote_state
to share data between the workspaces.
»Explore main.tf
Open main.tf
. This file uses the AWS provider to deploy the base infrastructure for this tutorial, including a VPC, subnets, an application security group, and a load balancer security group.
Toward the end of the file, you'll find aws_lb
which represents an ALB. When the load balancer receives the request, it evaluates the listener rules, defined by aws_lb_listener.app
, and routes traffic to the appropriate target group.
This load balancer currently directs all traffic to the blue load balancing target group on port 80.
»Explore blue.tf
Open blue.tf
. Here you'll find the configuration to deploy 2 AWS instances that start web servers, which return the text Version 1.0 - #${count.index}
. This represents the sample application's first version.
In addition, this file defines the blue load balancer target group and attaches the blue instances to it using aws_lb_target_group_attachment
.
»Initialize and apply the configuration
In your terminal, initialize your Terraform workspace.
Apply your configuration. Remember to confirm your apply with a yes
.
»Verify blue environment
Verify that your blue environment was deployed successfully by visiting the load balancer's DNS name in your browser or cURLing it from your terminal.
Note: It may take a few minutes for the load balancer's health checks to be successful and the web servers to respond.
Notice that the load balancer is routing traffic evenly to both VMs in the blue environment.
»Deploy green environment
Create a new file named green.tf
and paste in the configuration for the sample application's version 1.1.
Notice how this configuration is similar to the blue application, except that the web servers return green #${count.index}
.
Add the following variables to variables.tf
.
Apply your configuration to deploy your green application. Remember to confirm your apply with a yes
.
»Add feature toggles to route traffic
Even though you deployed your green environment, the load balancer is currently not routing traffic to it.
While you could manually modify the load balancer's target groups to include the green environment, using feature toggles codifies this change for you. In this step, you will add a traffic_distribution
variable and traffic_dist_map
local variable to your configuration. The configuration will automatically assign the target group's weight based on the traffic_distribution
variable.
First, add the configuration for the local value and traffic distribution variable to variables.tf
.
Notice that there are five traffic distributions defined by the local variable. Each traffic distribution specifies the weight for the respective target group.
The blue
target distribution is what is currently being applied — 100% of the traffic is routed to the blue environment, 0% to the green environment.
The blue-90
target distribution simulates canary testing. This canary test will route 90% of the traffic to the blue environment and 10% to the green environment.
The split
target distribution builds on top of canary testing by increasing traffic to the green environment. This will split the traffic evenly between the blue and green environments (50/50).
The green-90
target distribution increases traffic to the green environment — 90% of the traffic is routed to the green environment, 10% to the blue environment.
The green
target distribution will completely promote the green environment — the load balancer will route 100% of the traffic to the green environment.
Replace the aws_lb_listener.app
default action block in main.tf
. The configuration uses lookup
to set the target groups' weight. Notice that the configuration defaults to a directing all traffic to the blue environment if no value is set.
Modify the aws_lb_listener.app
's default action
block in main.tf
to match the following. The configuration uses lookup
to set the target groups' weight. Notice that the configuration defaults to a directing all traffic to the blue environment if no value is set.
Note: The ELB's stickiness is set to false and 1 second to demonstrate how traffic is directed between the two target groups. Refer to this AWS blog for guidance on connection draining and stickiness settings for production environments.
»Start shifting traffic to green environment
Apply your configuration to run a canary test by setting the traffic_distribution
variable to blue-90
. Remember to confirm your apply with a yes
.
»Verify canary deployment traffic
Verify that your load balancer is now routing 10% of the traffic to the green environment.
Note: It may take a few minutes for the load balancer's health checks to be successful and for the green environment to begin responding.
Notice that 10% of the traffic was routed to the green environment.
»Increase traffic to green environment
Now that the canary deployment was successful, increase the traffic to the green environment.
Apply your configuration to run a canary test by setting the traffic_distribution
variable to split
. Remember to confirm your apply with a yes
.
»Verify rolling deployment traffic
Verify that your load balancer is now splitting the traffic to the blue and green environments.
Note: It may take a few minutes for the load balancer's health checks to be successful. The results may not always show a clean 50/50 split when running the above command.
Notice how 50% of the traffic was routed to the blue environment and the rest to the green environment.
»Promote green environment
Now that both the canary and rolling deployments were successful, route 100% of the load balancer's traffic to the green environment to promote it.
Apply your configuration to promote the green environment by setting the traffic_distribution
variable to green
. Remember to confirm your apply with a yes
.
»Verify load balancer traffic
Verify that your load balancer is routing all traffic to the green environment.
Note: It may take a few minutes for the load balancer's health checks to be successful.
Congrats! You have successfully promoted your green environment with near-zero downtime.
»Scale down blue environment
Now that you have verified that all traffic is directed to your green environment, it is safe to disable the blue environment.
Apply your configuration to disable the blue environment by setting the traffic_distribution
variable to green
and enable_blue_env
to false
. Remember to confirm your apply with a yes
.
»Deploy new version
In this tutorial, you deployed the application's Version 1.0 on the blue environment, and the new version, 1.1, on the green environment. When you promoted the green environment, it became the current production environment. Deploy the next release on the blue environment, which minimizes modifications to your existing configuration by alternating the blue and green environments.
When you're ready to deploy a new application version, update the blue environment instances.
Modify the aws_instance.blue
's user_data
and tags
blocks in blue.tf
to display a new version number, 1.2.
»Enable new version environment
Apply your configuration to provision the new version of your infrastructure. Remember to confirm your apply with a yes
. Notice that the traffic_distribution
variable is set to green
. This is because the green environment contains your current production environment.
»Start shifting traffic to blue environment
Now that the new version is deployed to the blue environment, start shifting traffic to it.
Apply your configuration to run a canary test by setting the traffic_distribution
variable to green-90
. Remember to confirm your apply with a yes
.
Once the apply completes, Verify that your load balancer is routing all traffic to the green environment.
Note: It may take a few minutes for the load balancer's health checks to be successful.
»Promote blue environment
Now that the canary deployment is successful, fully promote your blue environment.
Apply your configuration to promote the blue environment by setting the traffic_distribution
variable to blue
. Remember to confirm your apply with a yes
.
Verify that your load balancer is routing all traffic to the blue environment.
Note: It may take a few minutes for the load balancer's health checks to be successful.
Congrats! You have successfully used blue-green, canary and rolling deployments to schedule two releases.
»Clean up your infrastructure
After verifying that the resources were deployed successfully, destroy them. Remember to respond to the confirmation prompt with yes
.
»Next steps
In this tutorial, you used an ALB to incrementally deploy a new application release. In addition, you implemented feature toggles to codify and run these advanced deployments techniques in a consistent, reliable manner.
To automate your infrastructure deployment process and to learn more about using load balancers to schedule near-zero downtime releases, visit the following resources.
- The Deploy Terraform infrastructure with CircleCI tutorial
- The Blue/Green & Canary Deployments Nomad tutorial
- The Traffic Splitting for Service Deployments Consul tutorial
- The Fine-tuning blue/green deployments on application load balancer AWS blog post walks you through blue green deployments using application load balancers. This blog post dives into more details on how to set target group level stickiness and enable connection draining to provide near-zero downtime releases.