Virtual Event
Join us for the next HashiConf Digital October 12-15, 2020 Register for Free

Day 1: Deploying Your First Vault Cluster

enterprise

Disaster Recovery Replication Setup

It is inevitable for organizations to have a disaster recovery (DR) strategy to protect their Vault deployment against catastrophic failure of an entire cluster. Vault Enterprise supports multi-datacenter deployment where you can replicate data across datacenters for performance as well as disaster recovery.

A cluster is the basic unit of Vault Enterprise replication which follows the leader-follower model. A leader cluster is referred to as the primary cluster and is considered the system of record. Data is streamed from the primary cluster to all secondary (follower) clusters.

Replication Pattern

The Mount Filter guide provides step-by-step instructions on setting up performance replication. This guide focuses on DR replication setup.

»Prerequisites

This intermediate Vault operations guide assumes that you have some working knowledge of Vault.

You need two Vault Enterprise clusters: one behaves as the primary cluster, and another becomes the secondary.

DR Prerequisites

»Important note

This guide walks you through the steps to enable the DR replication. Think of the following scenario where you are enabling both performance replication and DR replication.

Scenario

Before setting up the DR replication in the Data Center 2, first setup performance replication on the Cluster C as a performance secondary. This is because the data on the performance secondary cluster gets immediately cleared when you enable performance replication.

»Workflow

The basic steps to configure a DR replication:

  1. Enable DR Primary Replication

  2. Enable DR Secondary Replication DR Replication

  3. DR Operation Token Strategy

When a catastrophic failure causes the primary cluster (Cluster A) to be inoperable, promote the DR secondary (Cluster B) to become the new primary.

If the original primary cluster (Cluster A) becomes operational again after you successfully promoted Cluster B to be the new primary, you can perform one of the following options:

After failing over to Cluster B (Option 1), all the traffic is routed to Cluster B. If your goal is to make Cluster A back to be the primary, you can reverse the steps to restore the original topology.

»Enable DR Primary Replication

Enable DR replication on the primary cluster (Cluster A).

$ vault write -f sys/replication/dr/primary/enable
WARNING! The following warnings were returned from Vault:

* This cluster is being enabled as a primary for replication. Vault will be
unavailable for a brief period and will resume service shortly.

Generate a secondary token.

$ vault write sys/replication/dr/primary/secondary-token id="secondary"

The output should look similar to:

Key                              Value
---                              -----
wrapping_token:                  eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhZGRyIjoiaHR0cDovLzEzLjU3LjIwLjQxOjgyMDAiLCJleHAiOjE1MjkzMzkzMzEsImlhdCI6MTUyOTMzNzUzMSwianRpIjoiZDZmMmMzZTItMTZjNS1mNTU0LWYxMzAtNzMzZDE0OWNiNTIzIiwidHlwZSI6IndyYXBwaW5nIn0.MIGIAkIArsC3s1x7GYnEbaYwAbYUj-Wgp4B3Q3kVXL0BbaKvsECySV4Pwtm--i24OSQfI9zAlsG8ZypOWJdngRa59wlhWdQCQgG22-I-aNWPehjsqmwwEADU-u37LUrR6O0MsUCqtfWYwIM9o7PFP1wMZ4JwDGftQXUH6hIrkXZDxnnGsSCJ1Vl75w
wrapping_accessor:               bab0ea36-23f6-d21d-4ca6-a9c3673766a3
wrapping_token_ttl:              30m
wrapping_token_creation_time:    2018-06-18 15:58:51.645117216 +0000 UTC
wrapping_token_creation_path:    sys/replication/dr/primary/secondary-token

»Enable DR Secondary Replication

The following operations must be performed on the DR secondary cluster (Cluster B).

Enable DR replication on the secondary cluster.

$ vault write sys/replication/dr/secondary/enable token="..."

Where the token is the wrapping_token obtained from the primary cluster.

Expected output:

WARNING! The following warnings were returned from Vault:

* Vault has successfully found secondary information; it may take a while to
perform setup tasks. Vault will be unavailable until these tasks and initial
sync complete.

Refer to the Monitoring Vault Replication guide for replication health check.

»DR Operation Token Strategy

To promote a DR secondary cluster (Cluster B) to be the new primary, a DR operation token is needed. However, the process of generating a DR operation token requires a threshold of unseal keys or recovery keys if auto-unseal is enabled. This can be troublesome since a cluster failure is usually caused by unexpected incident and you may not be able to coordinate amongst the key holders to generate the DR operation token in a timely fashion while an immediate fail over to the healthy cluster is crucial to your business continuity.

As of Vault 1.4, you can create a batch DR operation token which can be used to promote the DR secondary cluster even if it was generated by the DR primary cluster. Therefore, this is an strategic operation that the Vault administrator can perform to prepare for unexpected loss of the DR primary.

  1. On the DR primary cluster (Cluster A), create a policy named "dr-secondary-promotion" allowing the update operation against the sys/replication/dr/secondary/promote path. In addition, you can add a policy against the sys/replication/dr/secondary/promote path so that you can use the same DR operation token to update the primary cluster that the secondary cluster points to.

    $ vault policy write dr-secondary-promotion - <<EOF
    path "sys/replication/dr/secondary/promote" {
      capabilities = [ "update" ]
    }
    
    # To update the primary to connect
    path "sys/replication/dr/secondary/update-primary" {
        capabilities = [ "update" ]
    }
    EOF
    
  2. Verify to make sure that the policy was created.

    $ vault policy list
    
    default
    dr-secondary-promotion
    root
    
  3. Create a token role named "failover-handler" with the dr-secondary-promotion policy attached and its type should be batch. Batch tokens cannot be renewed, so set the renewable parameter value to false. Also, set the orphan parameter to true.

    $ vault write auth/token/roles/failover-handler \
        allowed_policies=dr-secondary-promotion \
        orphan=true \
        renewable=false \
        token_type=batch
    
  4. Create a token for role, "failover-handler" with time-to-live (TTL) set to 8 hours.

    $ vault token create -role=failover-handler -ttl=8h
    
    Key                  Value
    ---                  -----
    token                b.AAAAAQLwVSoVwuI9p2kXICbWwe2cshR79nkjRERfAK7iZRFHRnrP_UJ8shdGSToSiqa9xCDts4Kp33IqPmbUd2xwqZ06r0WugLoRZJDswycXemZ11fkWPgrQkh6rLnjGZgGpGUOIA933Laqru1TcSEsTveJziwDaLpZSNABw1jGkej9rVsOxbSk2msPiwoAeE6-dLTebAvk
    token_accessor       n/a
    token_duration       8h
    token_renewable      false
    token_policies       ["default" "dr-secondary-promotion"]
    identity_policies    []
    policies             ["default" "dr-secondary-promotion"]
    

Securely store this batch token. If the DR secondary cluster needs to be promoted, you can use this batch token to perform the necessary operation. This eliminates the need for the unseal keys (or recovery keys if an auto-unseal is enabled).

»Promote DR Secondary to Primary

This step walks you through the promotion of the secondary cluster (Cluster B) to become the new primary when a catastrophic failure causes the primary cluster (Cluster A) to become inoperable.

DR Replication

Refer to the Important Note about Automated DR Failover section for more background information.

You need a DR operation token to perform this task. If you do not have a batch DR operation token, you must generate a DR operation token before you can promote Cluster B. The process below is the similar to Generating a Root Token (via CLI) that a threshold of unseal keys (or recovery keys if auto-unseal is enabled) are required.

  1. Start the DR operation token generation process.

    $ vault operator generate-root -dr-token -init
    

    The generated output would look like:

    A One-Time-Password has been generated for you and is shown in the OTP field.
    You will need this value to decode the resulting root token, so keep it safe.
    Nonce         b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started       true
    Progress      0/3
    Complete      false
    OTP           EYHAkPQYvvz93e8iI3pg1maQ
    OTP Length    24
    
  2. In order to generate a DR operation token, the following operation must be executed by each unseal key holder.

    Example:

    $ vault operator generate-root -dr-token \
        -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
        <primary_unseal_key_1>
    
    Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started          true
    Progress         1/3
    Complete         false
    
  3. Once the threshold has been reached, the output contains the encoded DR operation token.

    Example:

    $ vault operator generate-root -dr-token \
        -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
        <primary_unseal_key_3>
    
    Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started          true
    Progress         3/3
    Complete         true
    Encoded Token    djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf
    
  4. Decode the generated DR operation token (Encoded Token).

    Example:

    $ vault operator generate-root -dr-token \
         -decode="djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf" \
         -otp="EYHAkPQYvvz93e8iI3pg1maQ"
    
    s.3epDv29lsVfc0oZadkjs6qRN
    
  5. Finally, promote the DR secondary (Cluster B) to become the new primary. The request must pass the DR operation token.

    Example:

    $ vault write sys/replication/dr/secondary/promote \
         dr_operation_token=s.3epDv29lsVfc0oZadkjs6qRN
    
    WARNING! The following warnings were returned from Vault:
    
     This cluster is being promoted to a replication primary. Vault will be
    unavailable for a brief period and will resume service shortly.
    

»Option 1 - Demote DR Primary to Secondary

If the original DR primary cluster (Cluster A) becomes operational again after Cluster B was promoted, you can demote Cluster A to become a secondary.

Remember that there is only one primary cluster in the DR replication. At this point, Cluster A's data is outdated due to its outage. Demoting it to be a DR secondary will properly replicate data from the current DR primary cluster (Cluster B).

  1. Cluster A still thinks it is a DR primary that you should be able to log in with root token. Execute the following command to demote Cluster A to a secondary.

    $ vault write -f sys/replication/dr/primary/demote
    

    Cluster A does not attempt to connect to a primary, but it maintains the knowledge of its cluster ID and can be reconnected to the same DR replication set without wiping local storage. Perform the following steps to complete the update-primary operation.

  2. On the new primary cluster (Cluster B), generate a secondary activation token similar to what you have done in Enable DR Primary Replication.

    $ vault write sys/replication/dr/primary/secondary-token id=new-secondary
    

    Copy the generated wrapping_token which you will need when you invoke the sys/replication/dr/secondary/update-primary endpoint later.

  3. On Cluster A, generate the DR operation token similar to Promote DR Secondary to Primary.

    Example:

    $ vault operator generate-root -dr-token -init
    
    A One-Time-Password has been generated for you and is shown in the OTP field.
    You will need this value to decode the resulting root token, so keep it safe.
    Nonce 829b8057-a486-cd02-6ce0-0a2c5d5ab0ce
    Started true
    ...
    
  4. Distribute the generated nonce to each unseal key holder so that they can execute the generate-root command with their unseal key.

    $ vault operator generate-root -dr-token \
           -nonce=829b8057-a486-cd02-6ce0-0a2c5d5ab0ce \
           <unseal_key_of_original_dr_primary_here>
    
  5. Once the threshold has been reached, the output contains the encoded DR operation token which you need to decode first.

    $ vault operator generate-root -dr-token \
           -decode=JGsAeTApUAQsIGJTAxAIYgobcRo9TCY3IwA \
           -otp=WEaATFbgIi01meg1AUGNNySFle
    
     s.a8do2ceIRbnuoSKN6Ts5uqOe
    
  6. Finally, invoke the sys/replication/dr/secondary/update-primary endpoint.

    $ vault write sys/replication/dr/secondary/update-primary \
           dr_operation_token=s.a8do2ceIRbnuoSKN6Ts5uqOe \
           token="eyJhbGciOiJFUzUxMiIsImtpZCI6IiIsInR5cCI6IkpXVCJ9.eyJhY2..."
    

    While token value is the wrapping_token you copied from Cluster B.

»Option 2 - Disable the original DR Primary

Once the DR secondary cluster (Cluster B) is promoted to be the new primary, you may want to disable the DR replication on the original primary (Cluster A) when it becomes operational again.

Remember that there is only one primary cluster available in a DR replication group. Cluster A's data is outdated due to its outage.

Execute the following command to disable DR replication.

$ vault write -f sys/replication/dr/primary/disable

WARNING! The following warnings were returned from Vault:

* This cluster is having replication disabled. Vault will be unavailable for
  a brief period and will resume service shortly.

Any secondaries will no longer be able to connect.

»DR Failback

Currently, Cluster B is the active primary.

DR Replication

Once Cluster A is back to a healthy state, you may wish to revert it to being the primary. To achieve this, you must promote Cluster A back to be the DR primary (perform Promote DR Secondary to Primary on Cluster A) and then demote Cluster B to DR secondary (refer to Option 1).

You need a DR operation token to perform this task. If you do not have a batch DR operation token, you must generate a DR operation token first.

  1. On Cluster A, start the DR operation token generation process.

    $ vault operator generate-root -dr-token -init
    

    The generated output would look like:

    A One-Time-Password has been generated for you and is shown in the OTP field.
    You will need this value to decode the resulting root token, so keep it safe.
    Nonce         b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started       true
    Progress      0/3
    Complete      false
    OTP           EYHAkPQYvvz93e8iI3pg1maQ
    OTP Length    24
    
  2. In order to generate a DR operation token, the following operation must be executed by each unseal key holder.

    Example:

    $ vault operator generate-root -dr-token \
           -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
           <primary_unseal_key_1>
    
  3. Once the threshold has been reached, the output contains the encoded DR operation token.

    $ vault operator generate-root -dr-token \
         -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
         <primary_unseal_key_3>
    
    Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started          true
    Progress         3/3
    Complete         true
    Encoded Token    djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf
    
  4. Decode the generated DR operation token (Encoded Token).

    Example:

    $ vault operator generate-root -dr-token \
         -decode="djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf" \
         -otp="EYHAkPQYvvz93e8iI3pg1maQ"
    
    s.3epDv29lsVfc0oZadkjs6qRN
    
  5. Execute the following command on Cluster A to promote it back to be the DR primary using the DR Operation Token you generated when you demoted Cluster A to DR secondary in Option 1.

    Example:

    $ vault write sys/replication/dr/secondary/promote \
         dr_operation_token="s.3epDv29lsVfc0oZadkjs6qRN"
    
    WARNING! The following warnings were returned from Vault:
    
  • This cluster is being promoted to a replication primary. Vault will be unavailable for a brief period and will resume service shortly.
  1. Execute the following command on Cluster B to demote it to a secondary.

    $ vault write -f sys/replication/dr/primary/demote
    
    WARNING! The following warnings were returned from Vault:
    
  • This cluster is being demoted to a replication secondary. Vault will be unavailable for a brief period and will resume service shortly.
  1. Now, generate a secondary activation token similar to what you have done in Enable DR Primary Replication.

    $ vault write sys/replication/dr/primary/secondary-token id=secondary
    

    Copy the generated wrapping_token which you will need when you invoke the sys/replication/dr/secondary/update-primary endpoint later.

  2. On Cluster B, invoke the sys/replication/dr/secondary/update-primary endpoint using the wrapping_token you just generated on Cluster A, and the DR Operation Token that you generated in Promote DR Secondary to Primary.

    If you don't have the DR Operation Token any more, you can create a new one by following the steps described in Promote DR Secondary to Primary.

    Example:

    $ vault write sys/replication/dr/secondary/update-primary \
         dr_operation_token=s.YxmD095A8fKRGNGNiteJnEiE \
         token="eyJhbGciOiJFUzUxMiIsImtpZCI6IiIsInR5cCI6IkpXVCJ9.eyJhY2..."
    

»Important Note about Automated DR Failover

Vault does not support an automatic failover/promotion of a DR secondary cluster, and this is a deliberate choice due to the difficulty in accurately evaluating why a failover should or shouldn't happen. For example, imagine a DR secondary loses its connection to the primary. Is it because the primary is down, or is it because networking between the two has failed?

If the DR secondary promotes itself and clients start connecting to it, you now have two active clusters whose data sets will immediately start diverging. There's no way to understand simply from one perspective or the other which one of them is right.

Vault's API supports programmatically performing various replication operations which allows the customer to write their own logic about automating some of these operations based on experience within their own environments. You can review the available replication APIs at the following links:

»Help and Reference