Virtual Day
Building the ecosystem for the Cloud Operating Model with Azure, Cisco, F5 and GitLab Register

Operations

Vault HA Cluster with Integrated Storage

»Challenge

Vault supports many storage providers to persist its encrypted data (e.g. Consul, MySQL, DynamoDB, etc.). These providers require:

  • Their own administration; increasing complexity and total administration.
  • Provider configuration to allow Vault as a client.
  • Vault configuration to connect to the provider as a client.

»Solution

Use Vault's Integrated Storage to persist the encrypted data. The integrated storage has the following benefits:

  • Integrated into Vault (reducing total administration)
  • All configuration within Vault
  • Supports failover and multi-cluster replication
  • Eliminates additional network requests
  • Performance gains (reduces disk write/read overhead)
  • Lowers complexity when diagnosing issues (leading to faster time to recovery)

Reference Architecture

»Prerequisites

This guide requires Vault, sudo access, and additional configuration to create the cluster.

  • Install Vault.

  • Next, retrieve the configuration by cloning or downloading the hashicorp/vault-guides repository from GitHub.

    Clone the repository:

    $ git clone https://github.com/hashicorp/vault-guides.git
    

    Or download the repository:

    Download

    This repository contains supporting content for all of the Vault learn guides. The content specific to this guide can be found within a sub-directory.

  • Finally, go into the vault-guides/operations/raft-storage/local directory.

    $ cd vault-guides/operations/raft-storage/local
    

»Setup

The cluster.sh script configures and starts four Vaults. Here's a diagram:

Scenario

  • vault_1 (http://127.0.0.1:8200) is initialized and unsealed. The root token creates a transit key that enables the other Vaults auto-unseal. This Vault does not join the cluster.
  • vault_2 (http://127.0.0.2:8200) is initialized and unsealed. This Vault starts as the cluster leader. An example K/V-V2 secret is created.
  • vault_3 (http://127.0.0.3:8200) is only started. You will join it to the cluster.
  • vault_4 (http://127.0.0.4:8200) is only started. You will join it to the cluster.
  1. Set the cluster.sh file to executable:

    $ chmod +x cluster.sh
    
  2. Set up the local loopback addresses for each Vault:

    $ ./cluster.sh create network
    
    [vault_2] Enabling local loopback on 127.0.0.2 (requires sudo)
    Password:
    
    [vault_3] Enabling local loopback on 127.0.0.3 (requires sudo)
    
    [vault_4] Enabling local loopback on 127.0.0.4 (requires sudo)
    

    127.0.0.0/8 address block is assigned for use as the Internet host loopback address. (RFC3330)

  3. Create the configuration for each Vault:

    $ ./cluster.sh create config
    [vault_1] Creating configuration
      - creating $DEMO_HOME/config-vault_1.hcl
    [vault_2] Creating configuration
      - creating $DEMO_HOME/config-vault_2.hcl
      - creating $DEMO_HOME/raft-vault_2
    [vault_3] Creating configuration
      - creating $DEMO_HOME/config-vault_3.hcl
      - creating $DEMO_HOME/raft-vault_3
    [vault_4] Creating configuration
      - creating $DEMO_HOME/config-vault_4.hcl
      - creating $DEMO_HOME/raft-vault_4
    
  4. Setup vault_1:

    $ ./cluster.sh setup vault_1
    [vault_1] starting Vault server @ http://127.0.0.1:8200
    
    [vault_1] initializing and capturing the unseal key and root token
    
    [vault_1] Unseal key: O8jPtbOvpjLOnXOXmE2iIvFXZ0l2q4WyHdAK5UWQdiw=
    [vault_1] Root token: s.1AoVuEpdGp1aZez2iGiINPo6
    
    [vault_1] unsealing and logging in
    Key             Value
    ---             -----
    Seal Type       shamir
    Initialized     true
    Sealed          false
    Total Shares    1
    Threshold       1
    Version         1.3.0
    Cluster Name    vault-cluster-775635e5
    Cluster ID      a2a7e532-d690-45f0-e532-c3702be729fd
    HA Enabled      false
    Success! You are now authenticated. The token information displayed below
    is already stored in the token helper. You do NOT need to run "vault login"
    again. Future Vault requests will automatically use this token.
    
    Key                  Value
    ---                  -----
    token                s.1AoVuEpdGp1aZez2iGiINPo6
    token_accessor       RkToomYDiEnmfbsWbDgTFjiH
    token_duration       ∞
    token_renewable      false
    token_policies       ["root"]
    identity_policies    []
    policies             ["root"]
    
    [vault_1] enabling the transit secrets engine and storing key to enable remaining nodes to join the cluster
    Success! Enabled the transit secrets engine at: transit/
    Success! Data written to: transit/keys/unseal_key
    
  5. Setup vault_2:

    $ ./cluster.sh setup vault_2
    Using [vault_1] root token (s.1AoVuEpdGp1aZez2iGiINPo6) to retrieve transit key for auto-unseal
    [vault_2] starting Vault server @ http://127.0.0.2:8200
    
    [vault_2] initializing and capturing the recovery key and root token
    
    [vault_2] Recovery key: u7LNz4gDy0W/REpjLiVrr5JD05yeRzZQv218JN4rc3A=
    [vault_2] Root token: s.hVvcy6gpcaifc9JEetzJENes
    
    [vault_2] waiting to join Vault cluster (15 seconds)
    
    [vault_2] logging in and enabling the KV secrets engine
    Success! You are now authenticated. The token information displayed below
    is already stored in the token helper. You do NOT need to run "vault login"
    again. Future Vault requests will automatically use this token.
    
    Key                  Value
    ---                  -----
    token                s.hVvcy6gpcaifc9JEetzJENes
    token_accessor       mOzYyELAnMM9qAYRl9jItSPP
    token_duration       ∞
    token_renewable      false
    token_policies       ["root"]
    identity_policies    []
    policies             ["root"]
    Success! Enabled the kv-v2 secrets engine at: kv/
    
    [vault_2] storing secret 'kv/apikey' to demonstrate snapshot and recovery methods
    Key              Value
    ---              -----
    created_time     2019-11-21T20:23:20.007558Z
    deletion_time    n/a
    destroyed        false
    version          1
    ====== Metadata ======
    Key              Value
    ---              -----
    created_time     2019-11-21T20:23:20.007558Z
    deletion_time    n/a
    destroyed        false
    version          1
    
    ===== Data =====
    Key       Value
    ---       -----
    webapp    ABB39KKPTWOR832JGNLS02
    
  6. Setup vault_3:

    $ ./cluster.sh setup vault_3
    Using [vault_1] root token (s.1AoVuEpdGp1aZez2iGiINPo6) to retrieve transit key for auto-unseal
    [vault_3] starting Vault server @ http://127.0.0.3:8200
    
  7. Setup vault_4:

    $ ./cluster.sh setup vault_4
    Using [vault_1] root token (s.1AoVuEpdGp1aZez2iGiINPo6) to retrieve transit key for auto-unseal
    [vault_4] starting Vault server @ http://127.0.0.4:8200
    
  8. View the contents of the working directory to see the configuration files, raft storage files, and log files for each node.

    $ tree
    .
    ├── README.md
    ├── cluster.sh
    ├── config-vault_1.hcl
    ├── config-vault_2.hcl
    ├── config-vault_3.hcl
    ├── config-vault_4.hcl
    ├── raft-vault_2
    │   ├── raft
    │   │   ├── raft.db
    │   │   └── snapshots
    │   └── vault.db
    ├── raft-vault_3
    │   ├── raft
    │   │   ├── raft.db
    │   │   └── snapshots
    │   └── vault.db
    ├── raft-vault_4
    │   ├── raft
    │   │   ├── raft.db
    │   │   └── snapshots
    │   └── vault.db
    ├── recovery_key-vault_2
    ├── root_token-vault_1
    ├── root_token-vault_2
    ├── unseal_key-vault_1
    ├── vault_1.log
    ├── vault_2.log
    ├── vault_3.log
    └── vault_4.log
    
    9 directories, 20 files
    
  9. Validate that all four Vaults are running, and vault_2 is initialized and unsealed:

    $ ./cluster.sh status
    Found 4 Vault services
    
    [vault_1] status
    ...
    Initialized     true
    Sealed          false
    ...
    HA Enabled      false
    
    [vault_2] status
    ...
    Initialized     true
    Sealed          false
    
    ...
    HA Enabled true
    
    [vault_3] status
    ...
    Initialized false
    Sealed true
    ...
    HA Enabled true
    
    [vault_4] status
    ...
    Initialized     false
    Sealed          true
    ...
    HA Enabled      true
    

»Create an HA cluster

Currently vault_2 is initialized, unsealed, and has HA enabled. It is the only node in a cluster. The remaining nodes, vault_3 and vault_4, have not joined its cluster.

»Examine the leader

Let's discover more about the configuration of vault_2 and how it describes the current state of the cluster.

  1. First, examine the vault_2 server configuration file (config-vault_2.hcl).

    $ cat config-vault_2.hcl
      storage "raft" {
        path    = "$DEMO_HOME/raft-vault_2/"
        node_id = "vault_2"
      }
      listener "tcp" {
        address = "127.0.0.2:8200"
        cluster_address = "127.0.0.2:8201"
        tls_disable = true
      }
      seal "transit" {
        address            = "http://127.0.0.1:8200"
        # token is read from VAULT_TOKEN env
        # token              = ""
        disable_renewal    = "false"
    
        // Key configuration
        key_name           = "unseal_key"
        mount_path         = "transit/"
      }
      disable_mlock = true
      cluster_addr = "http://127.0.0.2:8201"
    

To use the integrated storage, the storage stanza is set to raft. The path specifies the path where Vault data will be stored ($DEMO_HOME/raft-vault_2/).

  1. Configure the vault CLI as a client to vault_2.

    $ export VAULT_ADDR="http://127.0.0.2:8200"
    
  2. Examine the current raft peer set.

    $ vault operator raft list-peers
    Node       Address           State     Voter
    ----       -------           -----     -----
    vault_2    127.0.0.2:8201    leader    true
    

    The cluster reports that vault_2 is the only node and is currently the leader.

  3. Examine vault_2 root token.

    $ cat root_token-vault_2
    s.ADlzR3iOhRNG88yRsqWCuKSA
    

The cluster.sh script captured the root token of vault_2 during its setup and stored it in this file. This root token has privileged access to all nodes within the cluster.

»Join nodes to the cluster

Add vault_3 to the cluster using the vault operator raft join command.

  1. Open a new terminal and set the working directory to the vault-guides/operations/raft-storage/local directory. Set the VAULT_ADDR to vault_3 API address.

    $ export VAULT_ADDR="http://127.0.0.3:8200"
    
  2. Next, join vault_3 to the vault_2 cluster.

    $ vault operator raft join http://127.0.0.2:8200
    
    Key       Value
    ---       -----
    Joined    true
    

    The http://127.0.0.2:8200 is the vault_2 server address which has been already initialized and auto-unsealed. This makes vault_2 the active node and its storage behaves as the leader in this cluster.

  3. Next, configure the vault CLI to use vault_2 root token for requests.

    $ export VAULT_TOKEN=$(cat root_token-vault_2)
    
  4. Examine the current raft peer set.

    $ vault operator raft list-peers
    
    Node       Address           State     Voter
    ----       -------           -----     -----
    vault_2    127.0.0.2:8201    leader    true
    vault_3    127.0.0.3:8201    follower  true
    

    Now, vault_3 is listed as a follower node.

  5. Examine vault_3 log file (vault_3.log).

    $ cat vault_3.log
    ...
    2019-11-21T14:36:15.837-0600 [TRACE] core: found new active node information, refreshing
    2019-11-21T14:36:15.837-0600 [DEBUG] core: parsing information for new active node: active_cluster_addr=https://127.0.0.2:8201 active_redirect_addr=http://127.0.0.2:8200
    2019-11-21T14:36:15.837-0600 [DEBUG] core: refreshing forwarding connection
    2019-11-21T14:36:15.837-0600 [DEBUG] core: clearing forwarding clients
    2019-11-21T14:36:15.837-0600 [DEBUG] core: done clearing forwarding clients
    2019-11-21T14:36:15.837-0600 [DEBUG] core: done refreshing forwarding connection
    2019-11-21T14:36:15.838-0600 [DEBUG] core: creating rpc dialer: host=fw-2e732f3f-c586-45ca-9975-66260b0c0d63
    2019-11-21T14:36:15.869-0600 [DEBUG] core.cluster-listener: performing client cert lookup
    

    The log describes the process of joining the cluster.

  6. Finally, verify that you can read the secret at kv/apikey.

    $ vault kv get kv/apikey
    ====== Metadata ======
    Key              Value
    ---              -----
    created_time     2019-11-22T19:52:29.59021Z
    deletion_time    n/a
    destroyed        false
    version          1
    
    ===== Data =====
    Key       Value
    ---       -----
    webapp    ABB39KKPTWOR832JGNLS02
    

»Retry join

You can use the vault operator raft join command to join vault_4 to the cluster in the same way you joined vault_3 to the cluster. However, if the connection details of all the nodes are known beforehand, you can configure the retry_join stanza in the server configuration file to automatically join the cluster.

  1. Stop vault_4.

    $ ./cluster.sh stop vault_4
    
  2. Modify the server configuration file, config-vault_4.hcl by adding the retry_join block inside the storage stanza as follows.

    storage "raft" {
      path    = "/git/vault-guides/operations/raft-storage/local/raft-vault_2/"
      node_id = "vault_4"
      retry_join {
        leader_api_addr = "http://127.0.0.2:8200"
      }
      retry_join {
        leader_api_addr = "http://127.0.0.3:8200"
      }
    }
    
    ## ...snipped...
    

    Since the address of vault_2 and vault_3 are known, you can predefine the possible cluster leader addresses in the retry_join block.

  3. Start vault_4.

    $ ./cluster.sh start vault_4
    
  4. Open a new terminal and set the working directory to the vault-guides/operations/raft-storage/local directory. Set the VAULT_ADDR to vault_4 API address.

    $ export VAULT_ADDR="http://127.0.0.4:8200"
    
  5. List the peers and notice that vault_4 is listed as a follower node.

    $ vault operator raft list-peers
    
    Node       Address           State       Voter
    ----       -------           -----       -----
    vault_2    127.0.0.2:8201    leader      true
    vault_3    127.0.0.3:8201    follower    true
    vault_4    127.0.0.4:8201    follower    true
    
  6. Configure the vault CLI, in this terminal, to use vault_2 root token for requests.

    $ export VAULT_TOKEN=$(cat root_token-vault_2)
    
  7. Patch the secret at kv/apikey.

    $ vault kv patch kv/apikey expiration="365 days"
    Key              Value
    ---              -----
    created_time     2019-11-22T21:43:33.914925Z
    deletion_time    n/a
    destroyed        false
    version          2
    
  8. Return to the terminal you used to configure vault_3 and read the secret again.

    $ vault kv get kv/apikey
    ====== Metadata ======
    Key              Value
    ---              -----
    created_time     2019-11-22T21:43:33.914925Z
    deletion_time    n/a
    destroyed        false
    version          2
    
    ======= Data =======
    Key           Value
    ---           -----
    expiration    365 days
    webapp        ABB39KKPTWOR832JGNLS02
    

»Raft snapshots for data recovery

Raft provides an interface to take snapshots of its data. These snapshots can be used later to restore data if ever becomes necessary.

»Take a snapshot

Execute the following command to take a snapshot of the data.

$ vault operator raft snapshot save demo.snapshot

»Simulate loss of data

First, verify that a secrets exists at kv/apikey.

$ vault kv get kv/apikey

Next, delete the secrets at kv/apikey.

$ vault kv metadata delete kv/apikey

Finally, verify that the data has been deleted.

$ vault kv get kv/apikey
No value found at kv/data/apikey

»Restore data from a snapshot

First, recover the data by restoring the data found in demo.snapshot.

$ vault operator raft snapshot restore demo.snapshot

(Optional) You can tail the server log of the active node (vault_2).

$ tail -f vault_2.log

Verify that the data has been recovered.

$ vault kv get kv/apikey

====== Metadata ======
Key              Value
---              -----
created_time     2019-07-02T05:50:39.038931Z
deletion_time    n/a
destroyed        false
version          2

======= Data =======
Key           Value
---           -----
expiration    365 days
webapp        ABB39KKPTWOR832JGNLS02

»Resign from active duty

Currently, vault_2 is the active node. Experiment to see what happens if vault_2 steps down from its active node duty.

In the terminal where VAULT_ADDR is set to http://127.0.0.2:8200, execute the step-down command.

$ vault operator step-down
Success! Stepped down: http://127.0.0.2:8200

In the terminal where VAULT_ADDR is set to http://127.0.0.3:8200, examine the raft peer set.

$ vault operator raft list-peers

Node       Address           State       Voter
----       -------           -----       -----
vault_2    127.0.0.2:8201    follower    true
vault_3    127.0.0.3:8201    leader      true
vault_4    127.0.0.4:8201    follower    true

Notice that vault_3 is now promoted to be the leader and vault_2 became a follower.

»Remove a cluster member

It may become important to remove nodes from the cluster for maintenance, upgrades, or to preserve compute resources.

Remove vault_4 from the cluster.

$ vault operator raft remove-peer vault_4
Peer removed successfully!

Verify that vault_4 has been removed from the cluster by viewing the raft cluster peers.

$ vault operator raft list-peers

Node       Address           State       Voter
----       -------           -----       -----
vault_2    127.0.0.2:8201    follower    true
vault_3    127.0.0.3:8201    leader      true

»Add vault_4 back to the cluster

This is an optional step.

If you wish to add vault_4 back to the HA cluster, return to the terminal where VAULT_ADDR is set to vault_4 API address (http://127.0.0.4:8200), and stop vault_4.

$ ./cluster.sh stop vault_4

Delete the data directory.

$ rm -r raft-vault_4

Now, create a raft-vault_4 directory again because the raft storage destination must exists before you can start the server.

$ mkdir raft-vault_4

Start the vault_4 server.

$ ./cluster.sh start vault_4

You can again examine the peer set to confirm that vault_4 successfully joined the cluster as a follower.

$ vault operator raft list-peers

Node       Address           State       Voter
----       -------           -----       -----
vault_2    127.0.0.2:8201    follower    true
vault_3    127.0.0.3:8201    leader      true
vault_4    127.0.0.4:8201    follower    true

»Recovery mode for troubleshooting

In the case of an outage caused by corrupt entries in the storage backend, an operator may need to start Vault in recovery mode. In this mode, Vault runs with minimal capabilities and exposes a subset of its API.

»Start in recovery mode

Use the setup script to stop all remaining cluster members to simulate an outage.

First stop vault_2.

$ ./cluster.sh stop vault_2

Found 1 Vault service(s) matching that name
[vault_2] stopping

Stop vault_4 if you added it back to the cluster.

$ ./cluster.sh stop vault_4

Found 1 Vault service(s) matching that name
[vault_4] stopping

Stop vault_3.

$ ./cluster.sh stop vault_3

Found 1 Vault service(s) matching that name
[vault_3] stopping

Start vault_3 in recovery mode.

$ VAULT_TOKEN=$(cat root_token-vault_1) VAULT_ADDR=http://127.0.0.1:8201 \
              vault server -recovery -config=config-vault_3.hcl
==> Vault server configuration:

               Seal Type: transit
         Transit Address: http://127.0.0.1:8200
        Transit Key Name: unseal_key
      Transit Mount Path: transit/
         Cluster Address: http://127.0.0.3:8201
               Log Level: info
           Recovery Mode: true
                 Storage: raft
                 Version: Vault v1.4.0-rc1

==> Vault server started! Log data will stream in below:

2019-11-22T16:45:06.871-0600 [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
2019-11-22T16:45:06.888-0600 [INFO]  seal-transit: unable to renew token, disabling renewal: err="Error making API request.

URL: PUT http://127.0.0.1:8200/v1/auth/token/renew-self
Code: 400. Errors:

* invalid lease ID"

»Create a recovery operational token

  1. Open a new terminal and configure the vault CLI as a client to vault_3.

    $ export VAULT_ADDR="http://127.0.0.3:8200"
    
  2. Next, generate a temporary one-time password (OTP).

    $ vault operator generate-root -generate-otp -recovery-token
    VTFdQgNjmSSCfiWmaztJgZa6MN
    
  3. Next, start the generation of the recovery token with the OTP.

    $ vault operator generate-root -init \
        -otp=VTFdQgNjmSSCfiWmaztJgZa6MN -recovery-token
    
    Nonce         13829b90-94eb-b7d8-f774-7b495569562d
    Started       true
    Progress      0/1
    Complete      false
    OTP Length    26
    
  4. Next, view the recovery key that was generated during the setup of vault_3.

    $ cat recovery_key-vault_2
    aBmg4RDBqihWVwYFG+hJOyNiLFAeFcDEN9yHyaEjc4c=
    
  5. Next, create an encoded token.

    $ vault operator generate-root -recovery-token
    
    Operation nonce: 13829b90-94eb-b7d8-f774-7b495569562d
    Unseal Key (will be hidden):
    

    Enter the recovery key when prompted. The output looks simiar to below.

    Nonce            13829b90-94eb-b7d8-f774-7b495569562d
    Started          true
    Progress         1/1
    Complete         true
    Encoded Token    JHo/XAUEAR0CCWI6JCgVDzQWGSlePgN6eCo
    
  6. Finally, complete the creation of a recovery token with the encoded token and otp.

    $ vault operator generate-root \
      -decode=JHo/XAUEAR0CCWI6JCgVDzQWGSlePgN6eCo \
      -otp=VTFdQgNjmSSCfiWmaztJgZa6MN \
      -recovery-token
    
    r.y8TcOwoZ1yBABbUlmc9dbL5d
    

»Fix the issue in the storage backend

In recovery mode Vault launches with a minimal API enabled. In this mode you are able to interact with the raw system backend.

Use the recovery operational token to list the contents at sys/raw/sys.

$ VAULT_TOKEN=r.y8TcOwoZ1yBABbUlmc9dbL5d vault list sys/raw/sys
Keys
----
counters/
policy/
token/

Imagine in your investigation you discover that a value or values at a particular path is the cause of the outage. To simulate this assume that the value found at the path sys/raw/sys/counters is the root case of the outage.

Delete the counters at sys/raw/sys/counters.

$ VAULT_TOKEN=r.y8TcOwoZ1yBABbUlmc9dbL5d vault delete sys/raw/sys/counters
Success! Data deleted (if it existed) at: sys/raw/sys/counters

»Resume normal operations

First, stop the Vault server running in recovery mode by pressing Ctrl+C in the terminal where vault_3 is started in recovery mode.

Next, start Vault service for vault_3.

$ ./cluster.sh start vault_3

Start vault_2.

$ ./cluster.sh start vault_2

»Clean up

When you are done you can quickly stop all services, remove all configuration and remove all modifications to your local system with the same cluster.sh script you used the setup.

Clean up your local workstation.

$ ./cluster.sh clean
...
Found 4 Vault services

Each node in the Vault cluster required:
 - local loopback address
 - a configuration file
 - a directory to store the contents of the Raft storage.

Removing local loopback address: 127.0.0.2 (sudo required)
Removing local loopback address: 127.0.0.3 (sudo required)
Removing local loopback address: 127.0.0.4 (sudo required)
Removing configuration file $DEMO_HOME/config-vault_1.hcl
Removing configuration file $DEMO_HOME/config-vault_2.hcl
Removing configuration file $DEMO_HOME/config-vault_3.hcl
Removing configuration file $DEMO_HOME/config-vault_4.hcl
Removing raft storage file $DEMO_HOME/raft-vault_2
Removing raft storage file $DEMO_HOME/raft-vault_3
Removing raft storage file $DEMO_HOME/raft-vault_4
Removing key $DEMO_HOME/unseal_key-vault_1
Removing key $DEMO_HOME/recovery_key-vault_2
Removing key $DEMO_HOME/root_token-vault_1
Removing key $DEMO_HOME/root_token-vault_2
Removing log file $DEMO_HOME/vault_1.log
Removing log file $DEMO_HOME/vault_2.log
Removing log file $DEMO_HOME/vault_3.log
Removing log file $DEMO_HOME/vault_4.log
Removing demo.snapshot
Clean complete

»Help and Reference