Operations

[Enterprise] Disaster Recovery Replication Setup

It is inevitable for organizations to have a disaster recovery (DR) strategy to protect their Vault deployment against catastrophic failure of an entire cluster. Vault Enterprise supports multi-datacenter deployment where you can replicate data across datacenters for performance as well as disaster recovery.

A cluster is the basic unit of Vault Enterprise replication which follows the leader-follower model. A leader cluster is referred to as the primary cluster and is considered the system of record. Data is streamed from the primary cluster to all secondary (follower) clusters.

Replication Pattern

The Mount Filter guide provides step-by-step instructions on setting up performance replication. This guide focuses on DR replication setup.

Prerequisites

This intermediate Vault operations guide assumes that you have some working knowledge of Vault.

You need two Vault Enterprise clusters: one behaves as the primary cluster, and another becomes the secondary.

DR Prerequisites

Workflow

Step 1 and Step 2 walks you through the basic steps to configure a DR replication.

DR Replication

When a catastrophic failure causes the primary cluster (Cluster A) to be inoperable, promote the DR secondary (Cluster B) to become the new primary.

If the original primary cluster (Cluster A) becomes operational again after you successfully promoted Cluster B to be the new primary, you can configure Cluster A to behave as a DR secondary.

Otherwise, you can disable the DR replication on Cluster A.

After failing over to Cluster B (Step 4: Option 1), all the traffic is routed to Cluster B. If your goal is to make Cluster A back to be the primary, perform Step 5 to fail back to Cluster A.

Step 1: Enable DR Primary Replication

CLI command / API call using cURL / Web UI

CLI command

Enable DR replication on the primary cluster (Cluster A).

$ vault write -f sys/replication/dr/primary/enable
WARNING! The following warnings were returned from Vault:

* This cluster is being enabled as a primary for replication. Vault will be
unavailable for a brief period and will resume service shortly.

Generate a secondary token.

$ vault write sys/replication/dr/primary/secondary-token id="secondary"

The output should look similar to:

Key                              Value
---                              -----
wrapping_token:                  eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhZGRyIjoiaHR0cDovLzEzLjU3LjIwLjQxOjgyMDAiLCJleHAiOjE1MjkzMzkzMzEsImlhdCI6MTUyOTMzNzUzMSwianRpIjoiZDZmMmMzZTItMTZjNS1mNTU0LWYxMzAtNzMzZDE0OWNiNTIzIiwidHlwZSI6IndyYXBwaW5nIn0.MIGIAkIArsC3s1x7GYnEbaYwAbYUj-Wgp4B3Q3kVXL0BbaKvsECySV4Pwtm--i24OSQfI9zAlsG8ZypOWJdngRa59wlhWdQCQgG22-I-aNWPehjsqmwwEADU-u37LUrR6O0MsUCqtfWYwIM9o7PFP1wMZ4JwDGftQXUH6hIrkXZDxnnGsSCJ1Vl75w
wrapping_accessor:               bab0ea36-23f6-d21d-4ca6-a9c3673766a3
wrapping_token_ttl:              30m
wrapping_token_creation_time:    2018-06-18 15:58:51.645117216 +0000 UTC
wrapping_token_creation_path:    sys/replication/dr/primary/secondary-token

API call using cURL

Enable DR replication on the primary cluster (Cluster A) by invoking /sys/replication/dr/primary/enable endpoint.

Example:

$ curl --header "X-Vault-Token: ..." \
      --request POST \
      --data '{}' \
      https://cluster-A.example.com:8200/v1/sys/replication/dr/primary/enable

{
  "request_id": "ef38af20-9c1f-138a-2d03-bbb6410fb0fc",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "This cluster is being enabled as a primary for replication. Vault will be
    unavailable for a brief period and will resume service shortly."
  ],
  "auth": null
}

Generate a secondary token by invoking /sys/replication/dr/primary/secondary-token endpoint.

Example:

$ curl --header "X-Vault-Token: ..." \
      --request POST \
      --data '{ "id": "secondary"}' \
      https://cluster-A.example.com:8200/v1/sys/replication/dr/primary/secondary-token | jq

{
  "request_id": "",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": {
    "token": "eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhZGRyIjoiaHR0cDovLzEzLjU3LjIwLjQxOjgyMDAiLCJleHAiOjE1MjkzNDQzMjcsImlhdCI6MTUyOTM0MjUyNywianRpIjoiYmRiZTJiNzEtODgwMS05YjZjLTNjMTQtMzVkNDI3NDQ3MjEzIiwidHlwZSI6IndyYXBwaW5nIn0.MIGIAkIBmESVVq_83l9hixTN7Ot0v5XQMsQfi1zV9APooZWkLvbS2olBWSQnskykQQH6GskMOi-ypOlAabqxWmfoCLA8-TICQgHRdkbJGgAQtWmjc8Z-ZEgymMv8YZq6qQxbUtPXloyM-cf_1Y1qmdGDYWtjPqoF5m1Bt_WkAJl9MguVb04QMWSotw",
    "accessor": "7e56e9da-178c-119d-1d01-807a203fa0b3",
    "ttl": 1800,
    "creation_time": "2018-06-18T17:22:07.129747708Z",
    "creation_path": "sys/replication/dr/primary/secondary-token"
  },
  "warnings": null,
  "auth": null
}

Web UI

Open a web browser and launch the Vault UI (e.g. https://cluster-A.example.com:8200/ui) and then login.

  1. Select the arrow next to Status and click Enable under REPLICATION. Performance Replication - primary

  2. Select the Disaster Recovery (DR) radio button. DR Replication - primary

  3. Click Enable replication.

  4. Select the Secondaries tab, and then click Add secondary. DR Replication - primary

  5. Populate the Secondary ID field, and click Generate token. DR Replication - primary

  6. Click Copy to copy the token which you will need to enable the DR secondary cluster. DR Replication - primary

Step 2: Enable DR Secondary Replication

The following operations must be performed on the DR secondary cluster (Cluster B).

CLI command / API call using cURL / Web UI

CLI command

Enable DR replication on the secondary cluster.

$ vault write sys/replication/dr/secondary/enable token="..."

Where the token is the wrapping_token obtained from the primary cluster.

Expected output:

WARNING! The following warnings were returned from Vault:

* Vault has successfully found secondary information; it may take a while to
perform setup tasks. Vault will be unavailable until these tasks and initial
sync complete.

API call using cURL

Enable DR replication on the secondary cluster.

$ tee payload.json <<EOF
{
 "token": "..."
}
EOF

$ curl --header "X-Vault-Token: ..." \
      --request POST \
      --data @payload.json \
      https://cluster-B.example.com:8200/v1/sys/replication/dr/secondary/enable | jq

{
  "request_id": "7a9730c1-b6fc-6557-5c0a-081e1f89ed2d",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "Vault has successfully found secondary information; it may take a while
    to perform setup tasks. Vault will be unavailable until these tasks and
    initial sync complete."
  ],
  "auth": null
}

Where the token in payload.json is the token obtained from the primary cluster.

Web UI

  1. Now, launch the Vault UI for the secondary cluster (e.g. https://cluster-B.example.com:8200/ui).

  2. Select the arrow next to Status and click Enable under REPLICATION. Performance Replication - primary

  3. Check the Disaster Recovery (DR) radio button and select secondary under the Cluster mode. Paste the token you copied from the primary in the Secondary activation token field. DR Replication - secondary

  4. Click Enable replication. DR Replication - secondary

Warning: This will immediately clear all data in the secondary cluster.


Step 3: Promote DR Secondary to Primary

This step walks you through the promotion of the secondary cluster (Cluster B) to become the new primary when a catastrophic failure causes the primary cluster (Cluster A) to become inoperable.

DR Replication

Refer to the Important Note about Automated DR Failover section for more background information.

First, you must generate a DR operation token which you need to promote Cluster B. The process, outlined below using API calls, is the similar to Generating a Root Token (via CLI).

CLI command / Web UI

CLI command

  1. Start the DR operation token generation process.
$ vault operator generate-root -dr-token -init

The generated output would look like:

A One-Time-Password has been generated for you and is shown in the OTP field.
You will need this value to decode the resulting root token, so keep it safe.
Nonce         b4738404-0a11-63aa-2cb6-e77dfd96946f
Started       true
Progress      0/3
Complete      false
OTP           EYHAkPQYvvz93e8iI3pg1maQ
OTP Length    24
  1. In order to generate a DR operation token, the following operation must be executed by each unseal key holder.

Example:

$ vault operator generate-root -dr-token \
        -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
        <primary_unseal_key_1>

Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
Started          true
Progress         1/3
Complete         true
Encoded Token    djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf
  1. Once the threshold has been reached, the output will contain the encoded DR operation token.

Example:

$ vault operator generate-root -dr-token \
        -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
        <primary_unseal_key_3>

Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
Started          true
Progress         3/3
Complete         true
Encoded Token    djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf
  1. Decode the generated DR operation token (Encoded Token).

Example:

$ vault operator generate-root -dr-token \
        -decode="djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf" \
        -otp="EYHAkPQYvvz93e8iI3pg1maQ"

s.3epDv29lsVfc0oZadkjs6qRN
  1. Finally, promote the DR secondary (Cluster B) to become the new primary. The request must pass the DR operation token.

Example:

$ vault write /sys/replication/dr/secondary/promote \
        dr_operation_token=s.3epDv29lsVfc0oZadkjs6qRN

WARNING! The following warnings were returned from Vault:

  * This cluster is being promoted to a replication primary. Vault will be
  unavailable for a brief period and will resume service shortly.

Web UI

  1. Click on Generate Operation Token.

  2. A quorum of unseal keys must be entered to create a new operation token for the DR secondary. DR Replication - secondary

  3. Once the threshold has been reached, the output displays the encoded DR operation token. Click the Copy icon. DR Replication - secondary

  4. Execute the copied CLI command from a terminal to generate a DR operation token.

Example:

$ vault operator generate-root -dr-token -otp="I4BbXfN0F2biXY53bXx4bKPwU0" \
        -decode="OhobGjUifglzc1oPEwtyfSUWEUAHAT4yPHU"
s.YxmD095A8fKRGNGNiteJnEiE
  1. Now, click Promote tab, and then enter the generated DR operation token. DR Replication - secondary

  2. Click Promote cluster.

When you prompted, "Are you sure you want to promote this cluster?", click Promote cluster again to complete.

DR Replication - secondary

Step 4: Option 1 - Demote DR Primary to Secondary

If the original DR primary cluster (Cluster A) becomes operational again, you may want to utilize the cluster by making it a DR secondary cluster. This step explains how to demote Cluster A to a secondary.


CLI command / API call using cURL / Web UI

CLI command

  1. Execute the following command to demote Cluster A to a secondary.
$ vault write -f sys/replication/dr/primary/demote

WARNING! The following warnings were returned from Vault:

  * This cluster is being demoted to a replication secondary. Vault will be
  unavailable for a brief period and will resume service shortly.
  1. On the new primary cluster (Cluster B), generate a secondary activation token similar to what you have done in Step 1.
$ vault write sys/replication/dr/primary/secondary-token id=new-secondary

Copy the generated wrapping_token which you will need when you invoke the sys/replication/dr/secondary/update-primary endpoint later.

  1. On Cluster A, generate the DR operation token similar to Step 3.

Example:

$ vault operator generate-root -dr-token -init

A One-Time-Password has been generated for you and is shown in the OTP field.
You will need this value to decode the resulting root token, so keep it safe.
Nonce         829b8057-a486-cd02-6ce0-0a2c5d5ab0ce
Started       true
...
  1. Distribute the generated nonce to each unseal key holder so that they can execute the generate-root command with their unseal key.
$ vault operator generate-root -dr-token \
         -nonce=829b8057-a486-cd02-6ce0-0a2c5d5ab0ce \
         <unseal_key_of_original_dr_primary_here>
  1. Once the threshold has been reached, the output will contain the encoded DR operation token which you need to decode first.
$ vault operator generate-root -dr-token \
        -decode=JGsAeTApUAQsIGJTAxAIYgobcRo9TCY3IwA \
        -otp=WEaATFbgIi01meg1AUGNNySFle

s.a8do2ceIRbnuoSKN6Ts5uqOe
  1. Finally, invoke the sys/replication/dr/secondary/update-primary endpoint.
$ vault write sys/replication/dr/secondary/update-primary \
        dr_operation_token=s.a8do2ceIRbnuoSKN6Ts5uqOe \
        token="eyJhbGciOiJFUzUxMiIsImtpZCI6IiIsInR5cCI6IkpXVCJ9.eyJhY2..."

While token value is the wrapping_token you copied from Cluster B.

Web UI

  1. From the Cluster A web UI, select the arrow next to Status and click Primary under REPLICATION.

  2. Select Manage.

  3. Click Demote cluster. DR Replication - demotion

  4. When you prompted, "Are you sure you want to demote this cluster?", click Demote cluster again to complete. DR Replication - demotion

At this point, Cluster A will not attempt to connect to a primary, but will maintain knowledge of its cluster ID and can be reconnected to the same DR replication set without wiping local storage. Continue to perform the update primary operation.

  1. Click on Generate Operation Token.

  2. A quorum of unseal keys must be entered to create a new operation token for the DR secondary. This operation must be performed by each unseal key holder.

  3. Once the threshold has been reached, the output displays the encoded DR operation token. Click the Copy icon. DR Replication - secondary

  4. Execute the copied CLI command from a terminal to generate a DR operation token.

$ vault operator generate-root -dr-token -otp="h0wSwoA6vP6tnN4f9JEPUFJWzy" \
        -decode="Gx4fYAMjOGM0aXcgOiFFHHQaDBZkDwwgFws"

s.h3tLyUB9ATToqzMPIF1IFwmr
  1. From the new primary cluster (Cluster B), select the Secondaries tab, and then click Add. Populate the Secondary ID field, and click Generate token. DR Replication

  2. Click Copy to copy the token which you will need to enable the DR secondary cluster.

  3. Return to the Cluster A web UI, click Update primary tab. Enter the generated DR operation token in the DR operation token field, and paste the secondary activation token you copied from Cluster B.

  4. Click Update primary. DR Replication - secondary When you prompted, "Are you sure you want to update this cluster's primary?", click Update primary again to complete.

Step 4: Option 2 - Disable DR Primary

Once the DR secondary cluster was promoted to be the new primary, you may want to disable the DR replication on the original primary when it becomes operational again.

CLI command / API call using cURL / Web UI

CLI command

Execute the following command to disable DR replication.

$ vault write -f sys/replication/dr/primary/disable

WARNING! The following warnings were returned from Vault:

* This cluster is having replication disabled. Vault will be unavailable for
  a brief period and will resume service shortly.

Any secondaries will no longer be able to connect.

API call using cURL

Invoke the sys/replication/dr/primary/disable endpoint to disable DR replication.

$ curl --header "X-Vault-Token: ..." \
       --request POST \
       https://cluster-A.example.com:8200/v1/sys/replication/dr/primary/disable | jq

{
   "request_id": "92a5f57a-2f7b-11be-b9dd-0f028396fba8",
   "lease_id": "",
   "renewable": false,
   "lease_duration": 0,
   "data": null,
   "wrap_info": null,
   "warnings": [
     "This cluster is having replication disabled. Vault will be unavailable for a brief period and will resume service shortly."
   ],
   "auth": null
}

Any secondaries will no longer be able to connect.

Web UI

Select Replication and click Disable replication.

DR Replication - demotion

When you prompted, "Are you sure you want to disable replication on this cluster?", click Disable again to complete.

DR Replication - demotion

Any secondaries will no longer be able to connect.

Step 5: DR Failback

At this point, Cluster B is the active primary.

DR Replication

Once Cluster A is back to a healthy state, you may wish to revert back to Cluster A being the primary. To achieve this, you must promote Cluster A back to be the DR primary (perform Step 3 on Cluster A) and then demote Cluster B to DR secondary (refer to Step 4: Option 1).

CLI command / Web UI

CLI command

  1. Execute the following command on Cluster A to promote it back to be the DR primary using the DR Operation Token you generated when you demoted Cluster A to DR secondary in Step 4: Option 1.

If you don't have the DR Operation Token any more, you can create a new one by following the steps described in Step 3 on Cluster A.

Example:

$ vault write /sys/replication/dr/secondary/promote \
        dr_operation_token=s.h3tLyUB9ATToqzMPIF1IFwmr

WARNING! The following warnings were returned from Vault:

  * This cluster is being promoted to a replication primary. Vault will be
  unavailable for a brief period and will resume service shortly.
  1. Execute the following command on Cluster B to demote it to a secondary.
$ vault write -f sys/replication/dr/primary/demote

WARNING! The following warnings were returned from Vault:

  * This cluster is being demoted to a replication secondary. Vault will be
  unavailable for a brief period and will resume service shortly.
  1. On Cluster A, generate a secondary activation token similar to what you have done in Step 1.
$ vault write sys/replication/dr/primary/secondary-token id=secondary

Copy the generated wrapping_token which you will need when you invoke the sys/replication/dr/secondary/update-primary endpoint later.

  1. On Cluster B, invoke the sys/replication/dr/secondary/update-primary endpoint using the wrapping_token you just generated on Cluster A, and the DR Operation Token that you generated in Step 3.

If you don't have the DR Operation Token any more, you can create a new one by following the steps described in Step 3.

Example:

$ vault write sys/replication/dr/secondary/update-primary \
        dr_operation_token=s.YxmD095A8fKRGNGNiteJnEiE \
        token="eyJhbGciOiJFUzUxMiIsImtpZCI6IiIsInR5cCI6IkpXVCJ9.eyJhY2..."

Web UI

  1. On Cluster A, click Promote tab, and then enter the DR Operation Token you generated when you demoted Cluster A to DR secondary in Step 4: Option 1.

If you don't have the DR Operation Token any more, you can create a new one by following the steps described in Step 3 on Cluster A.

  1. Click Promote cluster.

  2. On Cluster B, select the arrow next to Status and click Primary under REPLICATION.

  3. Select Manage.

  4. Click Demote cluster.

  5. When you prompted, "Are you sure you want to demote this cluster?", click Demote cluster again to complete.

  6. On Cluster A, select the Secondaries tab, and then click Add. Populate the Secondary ID field (e.g. secondary), and click Generate token.

  7. Click Copy to copy the token which you will need to enable the DR secondary cluster.

  8. Return to the Cluster B, click Update primary tab. Enter the DR operation token that you generated in Step 3 in the DR operation token field, and paste the secondary activation token you copied from Cluster A.

If you don't have the DR Operation Token any more, you can create a new one by following the steps described in Step 3.

  1. Click Update primary. When you prompted, "Are you sure you want to update this cluster's primary?", click Update primary again to complete.

Important Note about Automated DR Failover

Vault does not support an automatic failover/promotion of a DR secondary cluster, and this is a deliberate choice due to the difficulty in accurately evaluating why a failover should or shouldn't happen. For example, imagine a DR secondary loses its connection to the primary. Is it because the primary is down, or is it because networking between the two has failed?

If the DR secondary promotes itself and clients start connecting to it, you now have two active clusters whose data sets will immediately start diverging. There's no way to understand simply from one perspective or the other which one of them is right.

Vault's API supports programmatically performing various replication operations which allows the customer to write their own logic about automating some of these operations based on experience within their own environments. You can review the available replication APIs at the following links:

Help and Reference