Vault HA cluster with integrated storage on AWS

20min
|
Terraform
Vault
Video

Challenge

Vault supports many storage providers to persist its encrypted data (e.g. Consul, MySQL, DynamoDB, etc.). These providers require:

Their own administration; increasing complexity and total administration.
Provider configuration to allow Vault as a client.
Vault configuration to connect to the provider as a client.

Solution

Use Vault's Integrated Storage to persist the encrypted data. The integrated storage has the following benefits:

Integrated into Vault (reducing total administration)
All configuration within Vault
Supports failover and multi-cluster replication
Eliminates additional network requests
Lowers complexity when diagnosing issues (leading to faster time to recovery)

HashiCorp Cloud Platform (HCP) Vault clusters use Integrated Storage. To learn more about the managed Vault clusters, refer to the Getting Started with HCP Vault Dedicated tutorials. If you are a Kubernetes user, visit the Vault Installation to Minikube via Helm with Integrated Storage tutorial.

This tutorial walkthrough the following:

Prerequisites

This tutorial requires an AWS account, Terraform, and additional configuration to create the cluster.

Create an AWS account with AWS credentials and a EC2 key pair
Install Terraform

Setup

The Terraform files start four instances each running Vault. Here's a diagram:

vault_1 is initialized and unsealed. The root token creates a transit key that enables the other Vaults auto-unseal. This Vault does not join the cluster.
vault_2 is initialized and unsealed. This Vault starts as the cluster leader. An example K/V-V2 secret is created.
vault_3 is only started. You will join it to the cluster.
vault_4 is only started. You will join it to the cluster.

Demonstration only

The cluster created by these terraform files is solely for demonstration and should not be run in production.

Retrieve the configuration by cloning the hashicorp/learn-vault-raft repository from GitHub.
```
$ git clone https://github.com/hashicorp/learn-vault-raft.git
```
Change the working directory to learn-vault-raft/raft-storage/aws.
```
$ cd learn-vault-raft/raft-storage/aws
```
Set your AWS credentials as environment variables.
```
$ export AWS_ACCESS_KEY_ID = "<YOUR_AWS_ACCESS_KEY_ID>"
$ export AWS_SECRET_ACCESS_KEY = "<YOUR_AWS_SECRET_ACCESS_KEY>"
```
Tip
The above example uses IAM user authentication. You can use any authentication method described in the AWS provider documentation.
Copy terraform.tfvars.example and rename to terraform.tfvars
```
$ cp terraform.tfvars.example terraform.tfvars
```

Edit terraform.tfvars to override the default settings that describe your environment.

terraform.tfvars

1 2 3 4 5 6 7 8 9 1011121314151617# AWS EC2 Region
# default: 'us-east-1'
# @see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions
aws_region = "us-east-1"

# AWS EC2 Availability Zone
# default: 'us-east-1a'
# @see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#using-regions-availability-zones-launching
availability_zones = "us-east-1a"

# AWS EC2 Key Pair
# @see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html
key_name = "learn-vault-key"

# Specify a name here to tag all instances
# default: 'learn-vault-raft_storage'
environment_name = "learn-vault"

Initialize Terraform.

$ terraform init

Initializing modules...
...snip...

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Run terraform apply and review the planned actions. Your terminal output should indicate the plan is running and what resources will be created.

$ terraform apply
...
Apply complete! Resources: 20 added, 0 changed, 0 destroyed.

Do you want to perform these actions?
 Terraform will perform the actions described above.
 Only 'yes' will be accepted to approve.

Enter yes to confirm and resume.

When the apply command completes, the Terraform output will display the IP addresses of the provisioned Vault nodes.

Example:

vault_1 (13.56.238.70) | internal (10.0.101.21)
 - Initialized and unsealed.
 - The root token creates a transit key that enables the other Vaults to auto-unseal.
 - Does not join the High-Availability (HA) cluster.

vault_2 (13.57.14.206) | internal (10.0.101.22)
 - Initialized and unsealed.
 - The root token and recovery key is stored in /tmp/key.json.
 - K/V-V2 secrets engines enabled and secret stored.
 - Leader of HA cluster

 $ ssh -l ubuntu 13.57.14.206 -i <path/to/key.pem>

 # Root token
 $ ssh -l ubuntu 13.57.14.206 -i <path/to/key.pem> "cat ~/root_token"
 # Recovery key
 $ ssh -l ubuntu 13.57.14.206 -i <path/to/key.pem> "cat ~/recovery_key"

vault_3 (54.183.135.252) | internal (10.0.101.23)
 - Started
 - You will join it to cluster started by vault_2

 $ ssh -l ubuntu 54.183.135.252 -i <path/to/key.pem>

vault_4 (13.57.238.164) | internal (10.0.101.24)
 - Started
 - You will join it to cluster started by vault_2

 $ ssh -l ubuntu 13.57.238.164 -i <path/to/key.pem>

Additional setup time

While Terraform's work is done, these instances need time to complete their own installation and configuration. Progress is reported within the log file /var/log/tf-user-data.log and reports Complete when the instance is ready.

Create an HA cluster

Currently vault_2 is initialized, unsealed, has HA enabled, and is the only cluster member. Nodes, vault_3 and vault_4, have not yet joined the cluster.

Examine the leader

Let's discover more about the configuration of vault_2, and how it describes the current state of the cluster.

Open a new terminal and SSH into vault_2.

$ ssh -l ubuntu 13.57.14.206 -i <path/to/key.pem>

Examine the vault_2 server configuration file (/etc/vault.d/vault.hcl).

$ sudo cat /etc/vault.d/vault.hcl

To use the Integrated Storage, the storage stanza is set to raft. The path specifies the path where Vault data will be stored (/vault/vault_2).

/etc/vault.d/vault.hcl

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425storage "raft" {
  path    = "/vault/vault_2"
  node_id = "vault_2"
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  cluster_address     = "0.0.0.0:8201"
  tls_disable = true
}

seal "transit" {
  address            = "http://10.0.101.21:8200"
  token              = "root"
  disable_renewal    = "false"

  // Key configuration
  key_name           = "unseal_key"
  mount_path         = "transit/"
}

api_addr = "http://13.57.14.206:8200"
cluster_addr = "http://10.0.101.22:8201"
disable_mlock = true
ui=true

Warning

Although the listener stanza disables TLS for this tutorial, Vault should always be used with TLS in production to provide secure communication between clients and the Vault server. It requires a certificate file and key file on each Vault host.

Configure the vault CLI to use the Vault server.
```
$ export VAULT_ADDR=http://127.0.0.1:8200
```
Configure the vault CLI to use the root token for requests.
```
$ export VAULT_TOKEN=$(cat ~/root_token)
```

View the cluster information.

$ vault operator raft list-peers

Node       Address             State     Voter
----       -------             -----     -----
vault_2    10.0.101.22:8201    leader    true

The cluster reports that vault_2 is the only node and is currently the leader.

Join nodes to the cluster

Add vault_3 to the cluster using the vault operator raft join command.

Open a new terminal and SSH into vault_3.
```
$ ssh -l ubuntu 54.183.135.252 -i <path/to/key.pem>
```
Terraform output
You can display the output again by running terraform output from within the same directory where you ran terraform apply.
Configure the vault CLI to use the Vault server.
```
$ export VAULT_ADDR=http://127.0.0.1:8200
```
Join vault_3 to the vault_2 cluster.
```
$ vault operator raft join http://vault_2:8200
Key       Value
---       -----
Joined    true
```
When a node joins the cluster it receives a challenge from the leader node and must unseal itself to answer this challenge. This node unseals itself through vault_1, via the transit secrets engine unseal method, and then correctly answers the challenge.
Server aliases
The vault_2 server address, http://10.0.101.22, is aliased to vault_2 in the /etc/hosts file.
Configure the vault CLI to use vault_2 root token for requests which is stored in the ~/root_token file on the vault_2 host. (Be sure to retrieve the ~/root_token on the vault_2 host.)
```
$ export VAULT_TOKEN="s.lZMMSsFkuz4KAsJlFTNA3myK"
```

Examine the current raft peer set.

$ vault operator raft list-peers

Node       Address             State       Voter
----       -------             -----       -----
vault_2    10.0.101.22:8201    leader      true
vault_3    10.0.101.23:8201    follower    true

Now, vault_3 is listed as a follower node.

Verify that you can read the secret at kv/apikey.

$ vault kv get kv/apikey

====== Metadata ======
Key              Value
---              -----
created_time     2020-03-25T07:23:09.944332061Z
deletion_time    n/a
destroyed        false
version          1

===== Data =====
Key       Value
---       -----
webapp    ABB39KKPTWOR832JGNLS02

This node has access to the secrets defined within the cluster of which it is a member.

Retry join

Similarly, you can use the vault operator raft join command to join vault_4 to the cluster. However, if the connection details of all the nodes are known beforehand, you can configure the retry_join stanza in the server configuration file to automatically join the cluster.

Open a new terminal and SSH into vault_4.

$ ssh -l ubuntu 13.57.238.164 -i <path/to/key.pem>

Stop vault_4.
```
$ sudo systemctl stop vault
```
Open the server configuration file, /etc/vault.d/vault.hcl in a text editor of your choice.
Example:
```
$ sudo vi /etc/vault.d/vault.hcl
```

Add the retry_join block inside the storage stanza as follows. Be sure to set the correct leader_api_addr value for vault_2 and vault_3 using their public IP addresses.

  retry_join {
    leader_api_addr = "http://vault_2:8200"
  }
  retry_join {
    leader_api_addr = "http://vault_3:8200"
  }

The resulting /etc/vault.d/vault.hcl file should look like:

/etc/vault.d/vault.hcl

1 2 3 4 5 6 7 8 9 101112storage "raft" {
  path    = "/vault/vault_4"
  node_id = "vault_4"
  retry_join {
    leader_api_addr = "http://vault_2:8200"
  }
  retry_join {
    leader_api_addr = "http://vault_3:8200"
  }
}

## ...snipped...

Since the address of vault_2 and vault_3 are known, you can predefine the possible cluster leader addresses in the retry_join block.

Alternatively, you can configure auto_join instead of leader_api_addr.

  retry_join {
     auto_join = "provider=aws addr_type=public_v4 region=us-east-1 tag_key=cluster_name tag_value=raft-test"
     auto_join_scheme = "http"
  }

The auto_join was introduced in Vault 1.6.0. Read the Cloud auto join section to learn more.

Start vault_4.
```
$ sudo systemctl start vault
```
Configure the vault CLI to use vault_2 root token for requests which is stored in the ~/root_token file on the vault_2 host. (Be sure to retrieve the ~/root_token on the vault_2 host.)
```
$ export VAULT_TOKEN="s.lZMMSsFkuz4KAsJlFTNA3myK"
```

List the peers and notice that vault_4 is listed as a follower node.

$ vault operator raft list-peers

Node       Address             State       Voter
----       -------             -----       -----
vault_2    10.0.101.22:8201    leader      true
vault_3    10.0.101.23:8201    follower    true
vault_4    10.0.101.24:8201    follower    true

Patch the secret at kv/apikey.

$ vault kv patch kv/apikey expiration="365 days"

Key              Value
---              -----
created_time     2020-03-25T07:31:33.77995502Z
deletion_time    n/a
destroyed        false
version          2

The secret has updated for all nodes.

To verify, return to the terminal connected to vault_3 and get the same secret again.

$ vault kv get kv/apikey

====== Metadata ======
Key              Value
---              -----
created_time     2020-03-25T07:31:33.77995502Z
deletion_time    n/a
destroyed        false
version          2

======= Data =======
Key           Value
---           -----
expiration    365 days
webapp        ABB39KKPTWOR832JGNLS02

Cloud auto join

Note

The cloud auto join feature requires Vault 1.6.0 or later.

While theleader_api_addr requires the knowledge of IP addresses, you can use auto_join instead which takes cloud provider specific configurations as input. When auto_join is configured, Vault will automatically attempt to discover and resolve potential leader address.

storage "raft" {
  path    = "/vault/vault_4"
  node_id = "vault_4"

  retry_join {
    auto_join = "provider=aws addr_type=public_v4 tag_key=cluster_name tag_value=raft-test region=us-east-1"
    auto_join_scheme = "http"
  }
}

For Amazon EC2:

provider (required) - the name of the provider ("aws" in this case).
tag_key (required) - the key of the tag to auto-join on.
tag_value (required) - the value of the tag to auto-join on.
region (optional) - the AWS region to authenticate in.
addr_type (optional) - the type of address to discover: private_v4, public_v4, public_v6.
access_key_id (optional) - the AWS access key for authentication.
secret_access_key (optional) - the AWS secret access key for authentication.

This parameter value is very similar to how Consul's cloud auto-join parameter works.

In the server log, you should find entries similar as below:

$ sudo journalctl --no-pager -u vault

Example output:

[INFO]  core: [DEBUG] discover-aws: Found ip addresses: [54.88.215.200 100.24.61.175 34.230.41.191]
...snip...
[INFO]  core: attempting to join possible raft leader node: leader_addr=http://54.88.215.200:8200
[INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201
[INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
[INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_2 Address:10.0.101.22:8201} {Suffrage:Voter ID:vault_3 Address:10.0.101.23:8201} {Suffrage:Voter ID:vault_4 Address:10.0.101.24:8201}]"
[INFO]  core: successfully joined the raft cluster: leader_addr=
[INFO]  storage.raft: entering follower state: follower="Node at 10.0.101.24:8201 [Follower]" leader=
...snip...

You can specify retry_join blocks as follows:

/etc/vault.d/vault.hcl

1 2 3 4 5 6 7 8 9 10111213141516171819storage "raft" {
  path    = "/vault/vault_4"
  node_id = "vault_4"

  retry_join {
    auto_join_scheme = "http"
    auto_join = "provider=aws addr_type=public_v4 tag_key=cluster_name tag_value=raft-test region=us-east-1"
  }

  retry_join {
    leader_api_addr = "http://vault_2:8200"
  }

  retry_join {
    leader_api_addr = "http://vault_3:8200"
  }
}

## ...snipped...

Important to note that you can specify either leader_api_addr or auto_join within a single retry_join block and not both.

Refer to the Integrated Storage (Raft) Backend documentation as well as the operator raft command documentation for additional information.

Data snapshots for recovery

Raft provides an interface to take snapshots of its data. These snapshots can be used later to restore data if ever becomes necessary.

Take a snapshot

Execute the following command to take a snapshot of the data.

$ vault operator raft snapshot save demo.snapshot

You can take a snapshot from any of the cluster nodes (vault_2, vault_3, or vault_4); however, since the leader node is the single source of truth, it makes more sense to execute the snapshot command on the leader node (vault_2).

Automated snapshots

Note

Automated snapshot require Vault Enterprise 1.6.0 or later; the scenario environment in this tutorial uses the Vault Community Edition.

Instead of taking a snapshot manually, you can schedule snapshots to be taken automatically at your desired interval. You can create multiple automatic snapshot configurations.

Create an automatic snapshot configuration named, daily which takes a snapshot every 24 hours. The snapshots are stored locally in a directory named, raft-backup and retain 5 snapshots before one can be deleted to make room for the next snapshot. The local disk space available to store the snapshot is 1GB. This means that raft-backup retains up to 5 snapshots or 1GB of data whichever the condition meets first.

$ vault write /sys/storage/raft/snapshot-auto/config/daily interval="24h" retain=5 \
     path_prefix="raft-backup" storage_type="local" local_max_space=1073741824

In absence of a specific file_prefix value, the snapshot files will have a prefix of vault-snapshot.

Read and verify the automatic snapshot configuration.

$ vault read sys/storage/raft/snapshot-auto/config/daily

Key                Value
---                -----
file_prefix        vault-snapshot
interval           86400
local_max_space    1073741824
path_prefix        raft-backup
retain             5
storage_type       local

Available snapshot storage types are: local, aws-s3, azure-blob, and google-gcs. Depending on the target location, the configuration parameters differ.

View the path help on the sys/storage/raft/snapshot-auto/config endpoint.

$ vault path-help sys/storage/raft/snapshot-auto/config/my-config

Request:        storage/raft/snapshot-auto/config/my-config
Matching Route: ^storage/raft/snapshot-auto/config/(?P<name>\w(([\w-.]+)?\w)?)$

Configure automatic snapshotting

## PARAMETERS

    aws_access_key_id (string)

        AWS access key ID

    aws_s3_bucket (string)

        AWS bucket

    aws_s3_disable_tls (bool)

        Disable TLS for the AWS endpoint, intended only for testing

    ## ...snip...

First, create an API request payload containing the configuration values.

$ tee payload.json <<EOF
{
  "interval": "24h",
  "local_max_space":  1073741824,
  "path_prefix": "raft-backup",
  "retain": 5,
  "storage_type": "local"
}
EOF

Create a new daily automatic snapshot configuration.

$ curl --header "X-Vault-Token: <TOKEN>" \
   --request POST \
   --data @payload.json \
   http://<enterprise_addr>:8200/v1/sys/storage/raft/snapshot-auto/config/daily

Read and verify the automatic snapshot configuration.

$ curl -s --header "X-Vault-Token: <TOKEN>" \
   http://<enterprise_addr>:8200/v1/sys/storage/raft/snapshot-auto/config/daily \
   | jq -r ".data"

{
  "file_prefix": "vault-snapshot",
  "interval": 86400,
  "local_max_space": 1073741824,
  "path_prefix": "raft-backup",
  "retain": 5,
  "storage_type": "local"
}

In absence of a specific file_prefix value, the snapshot files will have a prefix of vault-snapshot.

Refer to the API documentation for more details.

Simulate loss of data

First, verify that a secrets exists at kv/apikey.

$ vault kv get kv/apikey

Next, delete the secrets at kv/apikey.

$ vault kv metadata delete kv/apikey

Finally, verify that the data has been deleted.

$ vault kv get kv/apikey
No value found at kv/data/apikey

Restore data from a snapshot

First, recover the data by restoring the data found in demo.snapshot.

$ vault operator raft snapshot restore demo.snapshot

(Optional) Examine the server log of the active node (vault_2).

$ sudo journalctl --no-pager -u vault

Verify that the data has been recovered.

$ vault kv get kv/apikey

====== Metadata ======
Key              Value
---              -----
created_time     2020-03-25T07:31:33.77995502Z
deletion_time    n/a
destroyed        false
version          2

======= Data =======
Key           Value
---           -----
expiration    365 days
webapp        ABB39KKPTWOR832JGNLS02

Resign from active duty

Currently, vault_2 is the active node. Experiment to see what happens if vault_2 steps down from its active node duty.

In the vault_2 terminal, execute the step-down command.

$ vault operator step-down

Now, examine the current raft peer set.

$ vault operator raft list-peers

Node       Address             State       Voter
----       -------             -----       -----
vault_2    10.0.101.22:8201    follower    true
vault_3    10.0.101.23:8201    leader      true
vault_4    10.0.101.24:8201    follower    true

Notice that vault_3 is now promoted to be the leader and vault_2 became a follower.

Remove a cluster member

It may become important to remove nodes from the cluster for maintenance, upgrades, or to preserve compute resources.

Remove vault_4 from the cluster.

$ vault operator raft remove-peer vault_4
Peer removed successfully!

Verify that vault_4 has been removed from the cluster by viewing the raft configuration.

$ vault operator raft list-peers

Node       Address             State       Voter
----       -------             -----       -----
vault_2    10.0.101.22:8201    follower    true
vault_3    10.0.101.23:8201    leader      true

Add vault_4 back to the cluster

If you wish to add vault_4 back to the HA cluster, return to the vault_4 SSH session terminal and stop the vault_4 server.

$ sudo systemctl stop vault

Delete the existing data in the vault/vault_4 directory.

$ rm -rf /vault/vault_4

Start the vault_4 server.

$ sudo systemctl start vault

You can again examine the peer set to confirm that vault_4 successfully joined the cluster as a follower.

$ vault operator raft list-peers

Node       Address             State       Voter
----       -------             -----       -----
vault_2    10.0.101.22:8201    follower    true
vault_3    10.0.101.23:8201    leader      true
vault_4    10.0.101.24:8201    follower    true

Recovery mode for troubleshooting

In the case of an outage caused by corrupt entries in the storage backend, an operator may need to start Vault in recovery mode. In this mode, Vault runs with minimal capabilities and exposes a subset of its API.

Simulate outage

Stop the Vault service on all remaining cluster members, vault_2 and vault_3, to simulate an outage.

Return to the terminal where you SSH'd into vault_2 and stop the Vault service.

Stop Vault on vault_2.

$ sudo systemctl stop vault

Return to the terminal where you SSH'd into vault_3 and stop the Vault service.

$ sudo systemctl stop vault

Start in recovery mode

Return to the terminal where you SSH'd into vault_3 and start Vault in recovery mode.

$ sudo systemctl start vault@-recovery

The content after the @ symbol is appended to the vault server command executed by this service. This is equivalent to running the vault server -config /etc/vault.d -recovery.

View the status of the vault@-recovery service.

$ sudo systemctl status vault@-recovery
● vault@-recovery.service - Vault
   Loaded: loaded (/etc/systemd/system/vault@.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-12-11 22:45:36 UTC; 18min ago
  Process: 5907 ExecStartPre=/sbin/setcap cap_ipc_lock=+ep /usr/local/bin/vault (code=exited, status=0/SUCCESS)
 Main PID: 5916 (vault)
    Tasks: 7 (limit: 1152)
   CGroup: /system.slice/system-vault.slice/vault@-recovery.service
           └─5916 /usr/local/bin/vault server -config /etc/vault.d -recovery

Dec 11 22:45:36 ip-10-0-101-244 vault[5916]:                Log Level: info
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]:            Recovery Mode: true
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]:                  Storage: raft
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]:                  Version: Vault v1.3.0
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]: ==> Vault server started! Log data will stream in below:
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]: 2019-12-11T22:45:36.850Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]: 2019-12-11T22:45:36.857Z [INFO]  seal-transit: unable to renew token, disabling renewal: err="Error making API request.
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]: URL: PUT http://10.0.101.22:8200/v1/auth/token/renew-self
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]: Code: 400. Errors:
Dec 11 22:45:36 ip-10-0-101-244 vault[5916]: * lease is not renewable"

Create a recovery operational token

Generate a temporary one-time password (OTP).

$ vault operator generate-root -generate-otp -recovery-token
dNsrrcQLSvEDsNAfOUdRN3ECGI

Start the generation of the recovery token with the OTP.

$ vault operator generate-root -init \
    -otp=dNsrrcQLSvEDsNAfOUdRN3ECGI -recovery-token

Example output:

Nonce         fae5045e-4a6a-729c-92b2-1cce79af5afb
Started       true
Progress      0/1
Complete      false
OTP Length    26

View the recovery key that was generated during the setup of vault_2. In the vault_2 SSH session terminal, read the recovery key value.
```
$ cat ~/recovery_key
dABZMe9xbAPx5MisJXSbn4UL0E6ZaH13iVF/JlgZGNM=
```
Note
Recovery key is used instead of unseal key since this cluster has Transit auto-unseal configured.

Create an encoded token.

$ vault operator generate-root -recovery-token

Operation nonce: fae5045e-4a6a-729c-92b2-1cce79af5afb
Unseal Key (will be hidden):

Enter the recovery key when prompted. The output looks simiar to below.

Nonce            fae5045e-4a6a-729c-92b2-1cce79af5afb
Started          true
Progress         1/1
Complete         true
Encoded Token    FmA0Sio6E3Q+EnITOR4IAgQtKj0CSzQkETA

Complete the creation of a recovery token with the Encoded Token value and OTP.

$ vault operator generate-root \
  -decode=FmA0Sio6E3Q+EnITOR4IAgQtKj0CSzQkETA \
  -otp dNsrrcQLSvEDsNAfOUdRN3ECGI \
  -recovery-token

Example output:

r.G8XYB8md7WJPIdKxNoLxqgVy

Fix the issue in the storage backend

In recovery mode Vault launches with a minimal API enabled. In this mode you are able to interact with the raw system backend.

Use the recovery token to list the contents at sys/raw/sys.

$ VAULT_TOKEN=r.G8XYB8md7WJPIdKxNoLxqgVy vault list sys/raw/sys
Keys
----
counters/
policy/
token/

Imagine in your investigation you discover that a value at a particular path is the cause of the outage. To simulate this, assume that the value found at the path sys/raw/sys/counters is the cause of the outage.

Delete the path at sys/raw/sys/counters.

$ VAULT_TOKEN=r.G8XYB8md7WJPIdKxNoLxqgVy vault delete sys/raw/sys/counters
Success! Data deleted (if it existed) at: sys/raw/sys/counters

Resume normal operations

First, stop the vault@-recovery service.

$ sudo systemctl stop vault@-recovery

Next, restart Vault service for vault_2 and vault_3.

$ sudo systemctl start vault

Cluster reset

When a node is brought up in recovery mode, it resets the list of cluster members. This means that when resuming normal operations, each node will need to rejoin the cluster.

Clean up

Return to the first terminal where you created the cluster and use Terraform to destroy the cluster.

Destroy the AWS resources provisioned by Terraform.

$ terraform destroy -auto-approve

Delete the state file.

$ rm *tfstate*

Help and reference

Vault HA cluster with integrated storage

Integrated storage autopilot

Challenge

Solution

Table of contents

Prerequisites

Setup

Create an HA cluster

Examine the leader

Join nodes to the cluster

Retry join

Cloud auto join

Data snapshots for recovery

Take a snapshot

Automated snapshots

Simulate loss of data

Restore data from a snapshot

Resign from active duty

Remove a cluster member

Add vault_4 back to the cluster

Recovery mode for troubleshooting

Simulate outage

Start in recovery mode

Create a recovery operational token

Fix the issue in the storage backend

Resume normal operations

Clean up

Help and reference