Workshops
Book a 90-minute product workshop led by HashiCorp engineers and product experts during HashiConf Digital Reserve your spot

Monitoring & Troubleshooting

Inspecting Data in Integrated Storage

In production deployments, Vault persists critical operational data to its configured durable storage. Gathering key facts about these operational data can often be extremely helpful when engaged in advanced troubleshooting.

During this type of troubleshooting for Vault servers which use the Integrated Storage backend, you can activate Recovery Mode on a server and inspect Vault data in an offline manner using an extremely limited API that prevents general use of Vault while troubleshooting is in progress.

One approach currently available for inspecting data in a server operating in recovery mode is to use the /sys/raw API endpoint.

This guide provides a detailed workflow for inspecting Vault data with example commands and responses to help familiarize you with this approach.

»Notes and prerequisites

Please pay attention to the following list of important notes and prerequisites before you begin with this guide.

  1. This is not meant to be a comprehensive guide covering every possible type of data which is found in Vault; rather, it is a means to help you get started in inspecting your Vault data so that you can familiarize yourself with the process.
  2. Vault version 1.4.0 or later using the Integrated Storage backend is required.
  3. To inspect Vault data using the vault CLI and HTTP API, this guide uses tools such as cURL and jq to fetch the information and process it.
  4. The HTTP API examples expect that the environment variable VAULT_TOKEN has the Recovery Token as its value.
  5. This guide uses examples from the root namespace; for details about inspecting data within other namespaces created using the Enterprise Namespaces feature, please see the What about Vault Enterprise namespaces? section.

»Problem

Certain advanced Vault troubleshooting situations can benefit from identifying issues with data stored in Vault, for example in identifying problematic auth methods or secrets engines which generate too many leases, to inexplicable growth in write ahead logs (WALs) and more.

As Vault does not expose these kinds of metrics for the data in storage directly to the user, you must query the storage directly using available tooling and techniques.

In these situations, isolating the Vault data from active use while inspecting data can be required to prevent further state changes by applications and clients. A Vault server can be started in Recovery Mode with a restricted API to both provide access to the data for inspection and troubleshooting while also preventing general purpose use of the Vault server at the time.

»Solution

Details about data stored in integrated storage can be retrieved from the /sys/raw API endpoint while Vault is operating in recovery mode.

This guide explains in detail about Vault data, and provides an example of a practical and safe workflow for inspecting the data in Integrated Storage.

Before beginning with the practical guide, let's take some time to learn about Vault data in durable storage.

»Workflow

The workflow for examining data in Integrated Storage is as follows.

  1. Stop all Vault cluster servers
  2. Start previous active server in recovery mode
  3. Generate a Recovery Mode operation token
  4. Inspect Vault data as required on this server with /sys/raw API; the majority of examples use the List Raw API
  5. Stop the server
  6. Start the server in normal mode
  7. Rejoin servers in new state (i.e. with no existing data) to active server which means to choose one of the following strategies:
    • Join 4 new standby servers with no existing data to active server to form cluster OR
    • Wipe Vault data on existing 4 standby servers and join them as new servers to the active server

The goal is to restore cluster operations and original size after inspecting data, so choose which approach makes most sense for your environment and the circumstances.

For more details, please refer to the recovery mode documentation.

»Stop all Vault cluster members

The first step in this workflow is to stop all Vault servers in the cluster, beginning with the standby servers.

Use the configured operating system service or startup script to stop the Vault service on each server node in the cluster.

Example:

$ sudo systemctl stop vault

Stop the standby servers in the cluster first, and then finally stop the active server.

After stopping all the servers, start the active server in recovery mode.

»Start previous active server in recovery mode

The /sys/raw API endpoint is not enabled by default. You must start a single Vault server in recovery mode, then generate a recovery mode operation token to access the /sys/raw endpoint used in this guide.

Review the Recovery Mode documentation, which describes the required -recovery runtime configuration flag. You should refer to that documentation before configuring your Vault server's startup script to start Vault in recovery mode.

When you have one Vault server operating in recovery mode, generating a recovery token, and then use the recovery mode operation token for all operations in this guide.

»Generate recovery mode operation token

All examples of querying the /sys/raw endpoint demonstrated in this guide require the use of a recovery mode operation token. You will generate one to use as an example of the process here with the with vault CLI using vault operator generate root.

  1. Generate a one-time password (OTP).

    $ vault operator generate-root -generate-otp -recovery-token
    

    Example output:

    VTFdQgNjmSSCfiWmaztJgZa6MN
    
  2. Use the OTP value to initialize the token generation process.

    $ vault operator generate-root -init \
        -otp=VTFdQgNjmSSCfiWmaztJgZa6MN \
        -recovery-token
    

    Example output:

    Nonce         13829b90-94eb-b7d8-f774-7b495569562d
    Started       true
    Progress      0/1
    Complete      false
    OTP Length    26
    
  3. You must pass in a quorum of unseal or recovery keys as necessary to generate the encoded token.

    $ vault operator generate-root -recovery-token
    
    Operation nonce: 13829b90-94eb-b7d8-f774-7b495569562d
    Unseal key (will be hidden):
    
  4. Enter the unseal key (or recovery key if auto-unseal is enabled) when prompted and the successful output resembles this example and includes the encoded token.

    Nonce            13829b90-94eb-b7d8-f774-7b495569562d
    Started          true
    Progress         1/1
    Complete         true
    Encoded Token    JHo/XAUEAR0CCWI6JCgVDzQWGSlePgN6eCo
    
  5. Decode the encoded token to produce the recovery mode operation token.

    $ vault operator generate-root \
      -decode=JHo/XAUEAR0CCWI6JCgVDzQWGSlePgN6eCo \
      -otp=VTFdQgNjmSSCfiWmaztJgZa6MN \
      -recovery-token
    

    Example output:

    r.y8TcOwoZ1yBABbUlmc9dbL5d
    

    Note the r prefix designating this a recovery mode operation token.

  6. Use the value of the recovery mode operation token that you generate for all examples of listing and reading /sys/raw/... paths throughout the guide.

    Example:

    $ VAULT_TOKEN=r.y8TcOwoZ1yBABbUlmc9dbL5d vault list /sys/raw/
    
    Keys
    ----
    audit/
    core/
    index-dr/
    index/
    logical/
    sys/
    wal/
    

    To avoid passing the recovery mode operation token

»Inspect Vault data

After generating a recovery mode operation token, you are ready to begin inspecting data with the /sys/raw API endpoint.

»New Vault server data example

Let's get to know some Vault data by viewing an actual example with descriptions of each element. When Vault is first initialized and unsealed the persisted data will resemble this example.

sys
 └── raw
     ├── core
     │   ├── _audit
     │   ├── _auth
     │   ├── _keyring
     │   ├── _local-audit
     │   ├── _local-auth
     │   ├── _local-mounts
     │   ├── _master
     │   ├── _mounts
     │   ├── _seal-config
     │   ├── _shamir-kek
     │   ├── cluster
     │   │   └── local
     │   │       └── _info
     │   ├── hsm
     │   │   └── _barrier-unseal-keys
     │   └── wrapping
     │       └── _jwtkey
     ├── index
     │   ├── _checkpoint
     │   └── pages
     │       ├── _04
     │       ├── _0f
     │       ├── _10
     │       ├── _a2
     │       ├── _a9
     │       ├── _b3
     │       ├── _f2
     │       └── _fe
     ├── index-dr
     │   ├── _checkpoint
     │   └── pages
     │       ├── _1a
     │       ├── _22
     │       ├── _58
     │       ├── _88
     │       ├── _d1
     │       ├── _e1
     │       └── _e4
     ├── logical
     │   └── a7f1489c-7e22-06e6-6c88-37fafe32f4f1
     │       └── _casesensitivity
     ├── sys
     │   ├── policy
     │   │   ├── _control-group
     │   │   ├── _default
     │   │   └── _response-wrapping
     │   └── token
     │       ├── _salt
     │       ├── accessor
     │       │   └── _e2c12bf2ce316b10dc60ec173b53aaff8fc402ca
     │       └── id
     │           └── _h322cd680d8ee8b202dd8ffdc110d0c212022d57934b6e7aa65f6b869b27a0ec4
     └── wal
         └── logs
             └── 00000000
                 ├── _0001
                 ├── _0002
                 ├── _0003
                 ├── _0004
                 ├── _0005
                 ├── _0006
                 ├── _0007
                 ├── _0008
                 ├── _0009
                 ├── _000a
                 ├── _000b
                 ├── _000c
                 ├── _000d
                 ├── _000e
                 ├── _000f
                 └── _0010

A total of 73 key/value pairs are present in this example, representing all of the data necessary for Vault to begin operations. A Vault server that is in production will have considerably more data and key/value pairs related to its specific auth methods, secrets engines, and so on.

Here is a brief explanation of each major branch and the elements within them from example.

  • core: Items contained in core are critical and internal to Vault operations; these include data about internal auditing, authentication, keyring, mounts, the master key, the seal configuration, cluster information, HSM barrier unseal keys, seal wrapping, and more.
  • index: This is local index data which can be used by the Performance Standby feature
  • index-dr: This is index data for the Disaster Recovery mode of Vault Enterprise Replication
  • logical: Dynamic secret configuration and static secrets are found here
  • sys: System data includes policy configuration along with tokens and their accessors.
  • wal: Write ahead logs (WAL) are present in Vault Enterprise installations to support the Performance Standby feature and assist with enabling Enterprise Replication

Those are the basics for now.

You will continue by inspecting secrets engine data.

»Secret engine data example

A common question about Vault secret data during support and operations troubleshooting scenarios is How many secrets exist in Vault for a particular secrets engine?

To answer this question, first develop some understanding of the secrets engine data storage structure with further examples.

Vault stores secrets the secrets engine data at /sys/raw/logical/<UUID>/ where <UUID> represents a unique identifier for each secrets engine enabled.

When a new Vault is initialized and unsealed, only the identity secrets engine is configured and present in the storage as shown in the example data:

├── logical
│   └── a7f1489c-7e22-06e6-6c88-37fafe32f4f1
│       └── _casesensitivity

After Vault is further configured and with additional secrets engines enabled, the logical path would be expected to contain more secrets engine data.

For example, here is a tree view of example secrets engine data with detailed explanation of each element.

├── logical
│   └── e2b7c3e2-3e21-3391-b73c-8a991a65789d
│       └── f030471f-1f42-6c61-9d42-179427741f49
│           ├── _salt
│           ├── _upgrading
│           ├── archive
│           │   └── _metadata
│           ├── metadata
│           │   └── _Fz2pkGY3Mo2Umyt7REtEpyNFJwWVrmS54tZbMBfbJDuuYhtcl6Wmgy1Byo7cqI8R3yUNkmtjAfb9Omw4mQJ
│           ├── policy
│           │   └── _metadata
│           └── versions
│               ├── 580
│               │   └── _081e7244b38d5761da22c6958357d27cb28fc31a8d306e356bc371db1021f
│               └── dfd
│                   └── _dca872765e8bb300874869f76994b3b7f5b811ea6103b2367cbe4261d55e4
│   ├── 2788376d-7042-4737-1ebd-9f6391a01f4e
│   │   ├── _ca
│   │   ├── _crl
│   │   ├── _urls
│   │   ├── config
│   │   │   └── _ca_bundle
│   │   └── role
│   │       └── _tacobot-root
│   ├── b7183aba-6e64-e001-fe57-3e7e4508fc0c
│   │   ├── _ca
│   │   ├── _crl
│   │   ├── _urls
│   │   ├── certs
│   │   │   ├── _17-57-81-3f-5f-08-43-00-79-97-b5-0c-b3-0e-5e-cd-49-a5-88-21
│   │   │   └── _19-19-9f-17-91-3b-8d-da-77-c2-c2-f9-37-1a-4f-19-4c-5b-f2-9a
│   │   ├── config
│   │   │   └── _ca_bundle
│   │   └── role
│   │       └── _tacobot-int
│   ├── cb1bfb31-3ccb-ef29-6352-874902c3a021
│   │   ├── config
│   │   │   ├── _mongodb
│   │   │   └── _mysql
│   │   └── role
│   │       ├── _tacobot-mongodb-readonly
│   │       └── _tacobot-mysql-readonly
│   ├── d1689597-4f78-a30b-7532-e7806be9fcba
│   │   └── _casesensitivity
│   └── fbd73ad9-4f9c-45be-5be2-3758d04808af
│       └── 9t7pwHwrPD0yiGuLKMHi912x
│           └── _my-secret

The previous example shows paths for several secrets engines in the root namespace; here are details on each secrets engine and its associated elements:

  • e2b7c3e2-3e21-3391-b73c-8a991a65789d A KV Secrets Engine - Version 2 containing internal configuration and metadata along with the secret data versions found under the versions key.
  • 2788376d-7042-4737-1ebd-9f6391a01f4e A PKI secrets engine which represents the root Certificate Authority (CA). It contains the CA information, The Certificate Revocation List (CRL) data, the URL configuration, internal configuration (with a CA bundle), and a role in this case called tacobot-root.
  • b7183aba-6e64-e001-fe57-3e7e4508fc0c A PKI secrets engine which represents the intermediate CA. It contains the CA information, The CRL data, the URL configuration, internal configuration (with a CA bundle), and a role in this case called tacobot-int. Note also that it has a certs key with some certificate serial numbers present which represent the certificates issued from the tacobot-int role.
  • cb1bfb31-3ccb-ef29-6352-874902c3a021 A Database Secrets Engine with configuration and roles for MongoDB and MySQL
  • d1689597-4f78-a30b-7532-e7806be9fcba An Identity Secrets Engine is the identity management solution for Vault and enabled by default. This secrets engine cannot be disabled or moved.
  • fbd73ad9-4f9c-45be-5be2-3758d04808af A Cubbyhole Secrets Engine which is enabled by default. It cannot be disabled, moved, or enabled multiple times.

Now that you are a bit more familiar with the general shape of data in Vault, let's move on to a full workflow where you actually inspect some data in a Vault cluster that uses the Integrated Storage backend.

Here are some examples of different data points available from inspecting Vault Integrated Storage along CLI and HTTP API command examples.

»Auth method data

The following are examples for getting information about enabled auth methods and their associated users.

»List enabled auth methods

This is conceptually similar to a vault auth list command or using the List auth Methods API with the significant difference being that output contains auth methods described by their internally assigned UUIDs instead of their human friendly names.

Use vault list in combination with jq to list enabled auth methods like this.

$ vault list -format=json sys/raw/auth/ | jq -r '.[]'

Example output:

64175a63-7172-95d9-5641-2f14296184a8/
b8acd19c-875d-8e19-3252-ebc1ca1ea936/

»Count auth method users

You can get a count of existing users for a given auth method like the Username and Password auth method, for example from the sys/raw/auth/$UUID/user path.

The $UUID portion (b8acd19c-875d-8e19-3252-ebc1ca1ea936) of the example path should be replaced with the value of an actual auth method UUID in your own Vault data.

Use vault list in combination with jq to list the number of users configured in the specified auth method like this.

$ vault list -format=json \
   sys/raw/auth/b8acd19c-875d-8e19-3252-ebc1ca1ea936/user/ \
   | jq '. | length'

Example output:

32

There are 32 users configured for this username and password auth method.

»Secrets engines data

Secrets engine data are found under the path sys/raw/sys/logical.

The following are examples for getting information about enabled secrets engines and their associated secrets.

This is similar in concept to a vault secrets list command or using the List Mounted Secrets Engines API with the significant difference being that output contains secrets engines described by their internally assigned UUIDs instead of their human friendly names.

»List enabled secrets engines

Use the vault list in combination with jq to list enabled secrets engines like this.

$ vault list /sys/raw/logical/

Example output:

Keys
----
693f4e64-fd85-171c-1639-ba96f118c447/
aaf7022b-6c02-118b-a8ee-4183fee3463b/
b2267f49-045c-dbe7-5491-f5173f364a62/
ec6cc06c-d245-ca11-2b23-4eea29406073/
f63e8c3d-e709-0168-8be0-dca447da699f/

»Token and accessor data

Active tokens and their accessors are found under the path sys/raw/sys/token.

Here is an example for counting active tokens.

Use the vault list in combination with jq to count active tokens like this.

$ curl \
  --silent \
  --header "X-Vault-Token: $VAULT_RECOVERY_TOKEN" \
  --request LIST \
  $VAULT_ADDR/v1/sys/raw/sys/token/id/ \
  | jq '.data.keys | length'

Example output:

32

Here is an example for counting active token accessors.

Use the vault list in combination with jq to count active tokens like this.

$ curl \
  --silent \
  --header "X-Vault-Token: $VAULT_RECOVERY_TOKEN" \
  --request LIST \
  $VAULT_ADDR/v1/sys/raw/sys/token/accessor/ \
  | jq '.data.keys | length'

Example output:

32

»Lease data

The following are some examples for getting information about leases associated with auth methods. As previously mentioned in this guide, it is not possible to recursively list keys with the /sys/raw API, so you must be specific and manually total multiple paths when necessary.

»Count auth method leases

Here is an example of listing the leases for an existing AppRole auth method that is enabled at the default path approle.

Use the vault list in combination with jq to count leases in the specified auth method.

$ vault list -format=json sys/raw/sys/expire/id/auth/approle/login/ \
  | jq '. | length'

Example output:

10

There are 10 active leases in Vault for this approle auth method. You can use the different names of your auth method paths from your own auth method list output to check leases in other auth methods.

»Write Ahead Log data

Write Ahead Logs (WALs) are found under the path sys/raw/wal/logs/.

»Count Write Ahead Logs

First, here is a plain list output example.

$ vault list -format=json sys/raw/wal/logs/

In this case, the output is a containing key named 00000000 in which each individual WAL object resides.

[
  "00000000/"
]

If you get the length of this key, the value should represent the count of WALs in 00000000.

Use vault list to get a count of Write Ahead Logs (WAL) from the storage with a command like this.

$ vault list -format=json sys/raw/wal/logs/00000000 | jq '. | length'

Example output:

360

These examples should be enough to get you started in inspecting your own Vault data when it becomes necessary to get specific answers to aid in troubleshooting.

»What about Vault Enterprise namespaces?

Beginning with Vault Enterprise version 0.11.0, the concept of Namespaces was introduced.

This changes the previous procedures slightly, in that each namespace will encapsulate its own leases and tokens in paths under the namespace internal storage path name, which you should note is not the same as its user-configured name.

This an example tree of paths from a minimal Vault instance for purposes of illustration:

├── namespaces
│   ├── 5Gsx8
│   │   └── sys
│   │       ├── expire
│   │       │   └── id
│   │       │       ├── auth
│   │       │       │   └── approle
│   │       │       │       └── login
│   │       │       │           ├── _h77ec2e32d144d6d75873a8b2098996b14ee70ed18addf6a4f5dd99d678d28b0b.5Gsx8
│   │       │       │           ├── _h8fba78bf0d4e497eb662709bf9e0f1e1ae0b053ec7cb3dbd3bc45ec76c66b693.5Gsx8
│   │       │       │           ├── _hf15a2ad701c9ec86089a5eb4704d2fbbbcff1fe66d118d005107a7dc83fb21ec.5Gsx8
│   │       │       │           └── _hf2b6bbafc5db6125191e09cbc7eaee9a080953217df4a105601c1fbf8b8ce492.5Gsx8
│   │       │       └── database
│   │       │           └── creds
│   │       │               └── mongodb-example-namespace-readonly
│   │       │                   ├── _HoNqW43Gy4N5Mbg3LzOEpuLa.5Gsx8
│   │       │                   ├── _OKHi04a98CLO2CYMMXHwduNj.5Gsx8
│   │       │                   ├── _YbfCjSZKwHUJeoAB31wIwn7f.5Gsx8
│   │       │                   └── _nNcN5KNlzh47JwhmGVMIMvkV.5Gsx8
│   │       ├── policy
│   │       │   ├── _control-group
│   │       │   ├── _default
│   │       │   └── _response-wrapping
│   │       └── token
│   │           ├── _salt
│   │           ├── accessor
│   │           │   ├── _h61e29be9d79c1d597cefab1ae8242fc8dffa6cb72ea1514260f1250632401638
│   │           │   ├── _ha6ddbad7b6237b0de4c03faf9cb82ae87aca4a8912b9c4b09c9f6b00dbe7c32e
│   │           │   ├── _ha9d7d995ddc93c4e136d686fc24ef4ca9f28e4793358b49e68c7e70e13843815
│   │           │   └── _he05dc5ca4f1ba60b20e406d4215d4d469a2a26231fab8f5067a97893147d95ea
│   │           └── id
│   │               ├── _h77ec2e32d144d6d75873a8b2098996b14ee70ed18addf6a4f5dd99d678d28b0b
│   │               ├── _h8fba78bf0d4e497eb662709bf9e0f1e1ae0b053ec7cb3dbd3bc45ec76c66b693
│   │               ├── _hf15a2ad701c9ec86089a5eb4704d2fbbbcff1fe66d118d005107a7dc83fb21ec
│   │               └── _hf2b6bbafc5db6125191e09cbc7eaee9a080953217df4a105601c1fbf8b8ce492
│   └── _info

The actual user-configured namespace name is example-namespace, but this is stored internally as a short unique identifier instead; in the case of our above example, this value is 5Gsx8.

Once you have determined the storage path for the namespace, you can then compose similar commands as those previously shown against the root namespace with your namespaces.

Here is an example of listing the leases for an existing AppRole auth method that is enabled at the default path approle in the example-namespace_/_5Gsx8 namespace.

Use the vault list in combination with jq to count leases in the specified auth method.

$ vault list -format=json \
  sys/raw/namespaces/5Gsx8/sys/expire/id/auth/approle/login \
  | jq '. | length'

Example output:

10

Likewise for a count of active tokens, the following example can be used as a starting point.

$ vault list -format=json /sys/raw/namespaces/5Gsx8/sys/token/id/ \
  | jq '. | length'

Example output:

10

After inspecting data, you can move on to stopping the recovery mode server, starting it normally, and rejoining other servers which participate in the cluster.

»Stop recovery mode server and start normally

Starting Vault in recovery mode with Integrated Storage resets the cluster member list, effectively reducing cluster members to 1 as described in the documentation on Raft rejoin.

Once you have completed inspection of your Vault data, you can stop the recovery mode server and then start it again normally (i.e., without the -recovery flag) so that you can rejoin the other cluster servers to it, and re-establish a highly available cluster.

When the Vault server is successfully started for normal operations, ensure that it is unsealed and active with vault status or the /sys/seal-status API before joining the standby servers to it.

»Start and join standby servers

With one Vault server active and unsealed, you can now proceed to joining the standby servers. Please note that due to the cluster size change introduced when using recovery mode, you must ensure that the standby servers are free of all existing Vault data before you attempt to join them to the active, from which they will receive the updated data.

You have two choices for achieving this requirement.

  1. Join all new standby servers; this is most helpful if there are uncorrectable issues with the servers for example.
  2. Remove the contents of the path directory that holds data for each of the standby Vault servers before starting them.

Once you have decided and implemented a strategy, go ahead and start the standby Vault servers and join each of them to the active server.

»Validate cluster

After joining all standby servers to the active server, you can validate your cluster health with vault operator raft list-peers.

Example:

$ vault operator raft list-peers

Example output:

Node       Address              State       Voter
----       -------              -----       -----
vault-0    10.10.42.200:8201    leader      true
vault-1    10.10.42.201:8201    follower    true
vault-2    10.10.42.202:8201    follower    true
vault-3    10.10.42.203:8201    follower    true
vault-4    10.10.42.204:8201    follower    true

Here in the example output, you can learn that all 5 of our servers are up and that vault-0 is the active leader.

»Summary

In this guide, you learned about how Vault stores its operational data in the Integrated Storage backend along with accessing these data through the /sys/raw API endpoint with Vault operating in recovery mode.

You should be able to use what you have learned here as a starting point in inspecting and measuring the characteristics most important to you about your Vault data.

»Help and reference

  1. Integrated Storage backend
  2. Recovery Mode
  3. Recovery mode for troubleshooting
  4. /sys/raw API
  5. vault auth list
  6. List Auth Methods API
  7. Username and Password auth method
  8. root token
  9. sudo capabilities
  10. operator unseal
  11. seal
  12. Raft rejoin
  13. vault operator raft list-peers