This tutorial focuses on inspecting Vault data stored in Integrated Storage. Also, refer to Vault Limits and Maximums for known upper limits on the size of certain fields and objects, and configurable limits on others.
In production deployments, Vault persists critical operational data to its configured durable storage. Gathering key facts about these operational data can often be extremely helpful when engaged in advanced troubleshooting.
During this type of troubleshooting for Vault servers which use the Integrated Storage backend, you can activate Recovery Mode on a server and inspect Vault data in an offline manner using an extremely limited API that prevents general use of Vault while troubleshooting is in progress.
One approach currently available for inspecting data in a server operating in recovery mode is to use the /sys/raw API endpoint.
This tutorial provides a detailed workflow for inspecting Vault data with example commands and responses to help familiarize you with this approach.
NOTE: There is a separate, but related tutorial for inspecting Vault data in Consul that describes a different approach for inspecting data in Vault installations which use the Consul storage backend.
- Notes and prerequisites
- Problem
- Solution
- Workflow
- Stop all Vault cluster members
- Start previous active server in recovery mode
- Generate Recovery Mode operation token
- Inspect Vault data
- Stop recovery mode Server and Start Normally
- Start and Join standby Servers
- Validate Cluster
- Summary
- Help and Reference
»Notes and prerequisites
Please pay attention to the following list of important notes and prerequisites before you begin with this tutorial.
- This is not meant to be a comprehensive tutorial covering every possible type of data which is found in Vault; rather, it is a means to help you get started in inspecting your Vault data so that you can familiarize yourself with the process.
- Vault version 1.4.0 or later using the Integrated Storage backend is required.
- To inspect Vault data using the
vault
CLI and HTTP API, this tutorial uses tools such as cURL and jq to fetch the information and process it. - The HTTP API examples expect that the environment variable
VAULT_TOKEN
has the Recovery Token as its value. - This tutorial uses examples from the root namespace; for details about inspecting data within other namespaces created using the Enterprise Namespaces feature, please see the What about Vault Enterprise namespaces? section.
CAUTION: All examples shown in this tutorial are read-only in nature and use only GET and LIST operations. Do not attempt any other operations outside of those demonstrated in this tutorial while the Vault server is in recovery mode.
»Problem
Certain advanced Vault troubleshooting situations can benefit from identifying issues with data stored in Vault, for example in identifying problematic auth methods or secrets engines which generate too many leases, to inexplicable growth in write ahead logs (WALs) and more.
As Vault does not expose these kinds of metrics for the data in storage directly to the user, you must query the storage directly using available tooling and techniques.
In these situations, isolating the Vault data from active use while inspecting data can be required to prevent further state changes by applications and clients. A Vault server can be started in Recovery Mode with a restricted API to both provide access to the data for inspection and troubleshooting while also preventing general purpose use of the Vault server at the time.
»Solution
Details about data stored in integrated storage can be retrieved from the /sys/raw API endpoint while Vault is operating in recovery mode.
This tutorial explains in detail about Vault data, and provides an example of a practical and safe workflow for inspecting the data in Integrated Storage.
Before beginning with the practical tutorial, let's take some time to learn about Vault data in durable storage.
»Workflow
The workflow for examining data in Integrated Storage is as follows.
- Stop all Vault cluster servers
- Start previous active server in recovery mode
- Generate a Recovery Mode operation token
- Inspect Vault data as required on this server with
/sys/raw
API; the majority of examples use the List Raw API - Stop the server
- Start the server in normal mode
- Rejoin servers in new state (i.e. with no existing data) to active server which means to choose one of the following strategies:
- Join 4 new standby servers with no existing data to active server to form cluster OR
- Wipe Vault data on existing 4 standby servers and join them as new servers to the active server
The goal is to restore cluster operations and original size after inspecting data, so choose which approach makes most sense for your environment and the circumstances.
For more details, please refer to the recovery mode documentation.
»Stop all Vault cluster members
The first step in this workflow is to stop all Vault servers in the cluster, beginning with the standby servers.
Use the configured operating system service or startup script to stop the Vault service on each server node in the cluster.
Example:
$ sudo systemctl stop vault
Stop the standby servers in the cluster first, and then finally stop the active server.
After stopping all the servers, start the active server in recovery mode.
»Start previous active server in recovery mode
NOTE: When Vault is started in recovery mode, only a subset of its API is available for generating recovery tokens and using the /sys/raw
API. This means that Vault is completely unavailable for general use while operating in recovery mode.
The /sys/raw API endpoint is not enabled by default. You must start a single Vault server in recovery mode, then generate a recovery mode operation token to access the /sys/raw
endpoint used in this tutorial.
Review the Recovery Mode documentation, which describes the required -recovery
runtime configuration flag. You should refer to that documentation before configuring your Vault server's startup script to start Vault in recovery mode.
When you have one Vault server operating in recovery mode, generating a recovery token, and then use the recovery mode operation token for all operations in this tutorial.
»Generate recovery mode operation token
All examples of querying the /sys/raw
endpoint demonstrated in this tutorial require the use of a recovery mode operation token. You will generate one to use as an example of the process here with the with vault
CLI using vault operator generate root
.
Generate a one-time password (OTP).
$ vault operator generate-root -generate-otp -recovery-token VTFdQgNjmSSCfiWmaztJgZa6MN
Use the OTP value to initialize the token generation process.
$ vault operator generate-root -init \ -otp=VTFdQgNjmSSCfiWmaztJgZa6MN \ -recovery-token Nonce 13829b90-94eb-b7d8-f774-7b495569562d Started true Progress 0/1 Complete false OTP Length 26
You must pass in a quorum of unseal or recovery keys as necessary to generate the encoded token.
$ vault operator generate-root -recovery-token Operation nonce: 13829b90-94eb-b7d8-f774-7b495569562d Unseal key (will be hidden):
Enter the unseal key (or recovery key if auto-unseal is enabled) when prompted and the successful output resembles this example and includes the encoded token.
Nonce 13829b90-94eb-b7d8-f774-7b495569562d Started true Progress 1/1 Complete true Encoded Token JHo/XAUEAR0CCWI6JCgVDzQWGSlePgN6eCo
Decode the encoded token to produce the recovery mode operation token.
$ vault operator generate-root \ -decode=JHo/XAUEAR0CCWI6JCgVDzQWGSlePgN6eCo \ -otp=VTFdQgNjmSSCfiWmaztJgZa6MN \ -recovery-token r.y8TcOwoZ1yBABbUlmc9dbL5d
Note the r prefix designating this a recovery mode operation token.
Use the value of the recovery mode operation token that you generate for all examples of listing and reading
/sys/raw/...
paths throughout the tutorial.Example:
$ VAULT_TOKEN=r.y8TcOwoZ1yBABbUlmc9dbL5d vault list sys/raw/ Keys ---- audit/ core/ index-dr/ index/ logical/ sys/ wal/
To avoid passing the recovery mode operation token
NOTE: Be sure to use this recovery mode operation token to inspect data as you follow this tutorial.
»Inspect Vault data
After generating a recovery mode operation token, you are ready to begin inspecting data with the /sys/raw
API endpoint.
NOTE: When inspecting Vault data in the Integrated Storage backend via recovery mode, remember that the /sys/raw
prefix must be prefixed to any paths you might be used to directly accessing from other storage backends.
»New Vault server data example
Let's get to know some Vault data by viewing an actual example with descriptions of each element. When Vault is first initialized and unsealed the persisted data will resemble this example.
sys
└── raw
├── core
│ ├── _audit
│ ├── _auth
│ ├── _keyring
│ ├── _local-audit
│ ├── _local-auth
│ ├── _local-mounts
│ ├── _master
│ ├── _mounts
│ ├── _seal-config
│ ├── _shamir-kek
│ ├── cluster
│ │ └── local
│ │ └── _info
│ ├── hsm
│ │ └── _barrier-unseal-keys
│ └── wrapping
│ └── _jwtkey
├── index
│ ├── _checkpoint
│ └── pages
│ ├── _04
│ ├── _0f
│ ├── _10
│ ├── _a2
│ ├── _a9
│ ├── _b3
│ ├── _f2
│ └── _fe
├── index-dr
│ ├── _checkpoint
│ └── pages
│ ├── _1a
│ ├── _22
│ ├── _58
│ ├── _88
│ ├── _d1
│ ├── _e1
│ └── _e4
├── logical
│ └── a7f1489c-7e22-06e6-6c88-37fafe32f4f1
│ └── _casesensitivity
├── sys
│ ├── policy
│ │ ├── _control-group
│ │ ├── _default
│ │ └── _response-wrapping
│ └── token
│ ├── _salt
│ ├── accessor
│ │ └── _e2c12bf2ce316b10dc60ec173b53aaff8fc402ca
│ └── id
│ └── _h322cd680d8ee8b202dd8ffdc110d0c212022d57934b6e7aa65f6b869b27a0ec4
└── wal
└── logs
└── 00000000
├── _0001
├── _0002
├── _0003
├── _0004
├── _0005
├── _0006
├── _0007
├── _0008
├── _0009
├── _000a
├── _000b
├── _000c
├── _000d
├── _000e
├── _000f
└── _0010
A total of 73 key/value pairs are present in this example, representing all of the data necessary for Vault to begin operations. A Vault server that is in production will have considerably more data and key/value pairs related to its specific auth methods, secrets engines, and so on.
Here is a brief explanation of each major branch and the elements within them from example.
- core: Items contained in core are critical and internal to Vault operations; these include data about internal auditing, authentication, keyring, mounts, the master key, the seal configuration, cluster information, HSM barrier unseal keys, seal wrapping, and more.
- index: This is local index data which can be used by the Performance Standby feature
- index-dr: This is index data for the Disaster Recovery mode of Vault Enterprise Replication
- logical: Dynamic secret configuration and static secrets are found here
- sys: System data includes policy configuration along with tokens and their accessors.
- wal: Write ahead logs (WAL) are present in Vault Enterprise installations to support the Performance Standby feature and assist with enabling Enterprise Replication
Those are the basics for now.
You will continue by inspecting secrets engine data.
»Secret engine data example
A common question about Vault secret data during support and operations troubleshooting scenarios is How many secrets exist in Vault for a particular secrets engine?
To answer this question, first develop some understanding of the secrets engine data storage structure with further examples.
Vault stores secrets the secrets engine data at /sys/raw/logical/<UUID>/
where <UUID>
represents a unique identifier for each secrets engine enabled.
When a new Vault is initialized and unsealed, only the identity secrets engine is configured and present in the storage as shown in the example data:
├── logical
│ └── a7f1489c-7e22-06e6-6c88-37fafe32f4f1
│ └── _casesensitivity
After Vault is further configured and with additional secrets engines enabled, the logical
path would be expected to contain more secrets engine data.
For example, here is a tree view of example secrets engine data with detailed explanation of each element.
├── logical
│ └── e2b7c3e2-3e21-3391-b73c-8a991a65789d
│ └── f030471f-1f42-6c61-9d42-179427741f49
│ ├── _salt
│ ├── _upgrading
│ ├── archive
│ │ └── _metadata
│ ├── metadata
│ │ └── _Fz2pkGY3Mo2Umyt7REtEpyNFJwWVrmS54tZbMBfbJDuuYhtcl6Wmgy1Byo7cqI8R3yUNkmtjAfb9Omw4mQJ
│ ├── policy
│ │ └── _metadata
│ └── versions
│ ├── 580
│ │ └── _081e7244b38d5761da22c6958357d27cb28fc31a8d306e356bc371db1021f
│ └── dfd
│ └── _dca872765e8bb300874869f76994b3b7f5b811ea6103b2367cbe4261d55e4
│ ├── 2788376d-7042-4737-1ebd-9f6391a01f4e
│ │ ├── _ca
│ │ ├── _crl
│ │ ├── _urls
│ │ ├── config
│ │ │ └── _ca_bundle
│ │ └── role
│ │ └── _tacobot-root
│ ├── b7183aba-6e64-e001-fe57-3e7e4508fc0c
│ │ ├── _ca
│ │ ├── _crl
│ │ ├── _urls
│ │ ├── certs
│ │ │ ├── _17-57-81-3f-5f-08-43-00-79-97-b5-0c-b3-0e-5e-cd-49-a5-88-21
│ │ │ └── _19-19-9f-17-91-3b-8d-da-77-c2-c2-f9-37-1a-4f-19-4c-5b-f2-9a
│ │ ├── config
│ │ │ └── _ca_bundle
│ │ └── role
│ │ └── _tacobot-int
│ ├── cb1bfb31-3ccb-ef29-6352-874902c3a021
│ │ ├── config
│ │ │ ├── _mongodb
│ │ │ └── _mysql
│ │ └── role
│ │ ├── _tacobot-mongodb-readonly
│ │ └── _tacobot-mysql-readonly
│ ├── d1689597-4f78-a30b-7532-e7806be9fcba
│ │ └── _casesensitivity
│ └── fbd73ad9-4f9c-45be-5be2-3758d04808af
│ └── 9t7pwHwrPD0yiGuLKMHi912x
│ └── _my-secret
The previous example shows paths for several secrets engines in the root namespace; here are details on each secrets engine and its associated elements:
- e2b7c3e2-3e21-3391-b73c-8a991a65789d A KV Secrets Engine - Version 2 containing internal configuration and metadata along with the secret data versions found under the
versions
key. - 2788376d-7042-4737-1ebd-9f6391a01f4e A PKI secrets engine which represents the root Certificate Authority (CA). It contains the CA information, The Certificate Revocation List (CRL) data, the URL configuration, internal configuration (with a CA bundle), and a role in this case called tacobot-root.
- b7183aba-6e64-e001-fe57-3e7e4508fc0c A PKI secrets engine which represents the intermediate CA. It contains the CA information, The CRL data, the URL configuration, internal configuration (with a CA bundle), and a role in this case called tacobot-int. Note also that it has a
certs
key with some certificate serial numbers present which represent the certificates issued from the tacobot-int role. - cb1bfb31-3ccb-ef29-6352-874902c3a021 A Database Secrets Engine with configuration and roles for MongoDB and MySQL
- d1689597-4f78-a30b-7532-e7806be9fcba An Identity Secrets Engine is the identity management solution for Vault and enabled by default. This secrets engine cannot be disabled or moved.
- fbd73ad9-4f9c-45be-5be2-3758d04808af A Cubbyhole Secrets Engine which is enabled by default. It cannot be disabled, moved, or enabled multiple times.
Now that you are a bit more familiar with the general shape of data in Vault, let's move on to a full workflow where you actually inspect some data in a Vault cluster that uses the Integrated Storage backend.
Here are some examples of different data points available from inspecting Vault Integrated Storage along CLI and HTTP API command examples.
»Auth method data
The following are examples for getting information about enabled auth methods and their associated users.
»List enabled auth methods
This is conceptually similar to a vault auth list
command or using the List auth Methods API with the significant difference being that output contains auth methods described by their internally assigned UUIDs instead of their human friendly names.
Use vault list
in combination with jq
to list enabled auth methods like this.
$ vault list -format=json sys/raw/auth/ | jq -r '.[]'
64175a63-7172-95d9-5641-2f14296184a8/
b8acd19c-875d-8e19-3252-ebc1ca1ea936/
»Count auth method users
You can get a count of existing users for a given auth method like the Username and Password auth method, for example from the sys/raw/auth/$UUID/user
path.
The $UUID
portion (b8acd19c-875d-8e19-3252-ebc1ca1ea936) of the example path should be replaced with the value of an actual auth method UUID in your own Vault data.
Use vault list
in combination with jq
to list the number of users configured in the specified auth method like this.
$ vault list -format=json \
sys/raw/auth/b8acd19c-875d-8e19-3252-ebc1ca1ea936/user/ \
| jq '. | length'
32
There are 32 users configured for this username and password auth method.
»Secrets engines data
Secrets engine data are found under the path sys/raw/sys/logical
.
The following are examples for getting information about enabled secrets engines and their associated secrets.
This is similar in concept to a vault secrets list
command or using the List Mounted Secrets Engines API with the significant difference being that output contains secrets engines described by their internally assigned UUIDs instead of their human friendly names.
»List enabled secrets engines
Use the vault list
in combination with jq
to list enabled secrets engines like this.
$ vault list sys/raw/logical/
Keys
----
693f4e64-fd85-171c-1639-ba96f118c447/
aaf7022b-6c02-118b-a8ee-4183fee3463b/
b2267f49-045c-dbe7-5491-f5173f364a62/
ec6cc06c-d245-ca11-2b23-4eea29406073/
f63e8c3d-e709-0168-8be0-dca447da699f/
»Token and accessor data
Active tokens and their accessors are found under the path sys/raw/sys/token
.
If you are familiar with inspecting data in Consul storage using the consul kv
command or Consul HTTP API, you might be aware that some of the API offers recursive key listing which helps with totaling up counts. The Vault API does not currently support this kind of recursive listing however, so examples must be more focused and you need to manually total counts for multiple auth methods, secrets engines, leases and so on.
Here is an example for counting active tokens.
Use the vault list
in combination with jq
to count active tokens like this.
$ curl \
--silent \
--header "X-Vault-Token: $VAULT_RECOVERY_TOKEN" \
--request LIST \
$VAULT_ADDR/v1/sys/raw/sys/token/id/ \
| jq '.data.keys | length'
32
Here is an example for counting active token accessors.
Use the vault list
in combination with jq
to count active tokens like this.
$ curl \
--silent \
--header "X-Vault-Token: $VAULT_RECOVERY_TOKEN" \
--request LIST \
$VAULT_ADDR/v1/sys/raw/sys/token/accessor/ \
| jq '.data.keys | length'
32
»Lease data
The following are some examples for getting information about leases associated with auth methods. As previously mentioned in this tutorial, it is not possible to recursively list keys with the /sys/raw
API, so you must be specific and manually total multiple paths when necessary.
»Count auth method leases
Here is an example of listing the leases for an existing AppRole auth method that is enabled at the default path approle
.
Use the vault list
in combination with jq
to count leases in the specified auth method.
$ vault list -format=json sys/raw/sys/expire/id/auth/approle/login/ \
| jq '. | length'
10
There are 10 active leases in Vault for this approle auth method. You can use the different names of your auth method paths from your own auth method list output to check leases in other auth methods.
»Write Ahead Log data
Write Ahead Logs (WALs) are found under the path sys/raw/wal/logs/
.
»Count Write Ahead Logs
First, here is a plain list output example.
$ vault list -format=json sys/raw/wal/logs/
In this case, the output is a containing key named 00000000 in which each individual WAL object resides.
["00000000/"]
If you get the length of this key, the value should represent the count of WALs in 00000000.
Use vault list
to get a count of Write Ahead Logs (WAL) from the storage with a command like this.
$ vault list -format=json sys/raw/wal/logs/00000000 | jq '. | length'
360
These examples should be enough to get you started in inspecting your own Vault data when it becomes necessary to get specific answers to aid in troubleshooting.
»What about Vault Enterprise namespaces?
Beginning with Vault Enterprise version 0.11.0, the concept of Namespaces was introduced.
This changes the previous procedures slightly, in that each namespace will encapsulate its own leases and tokens in paths under the namespace internal storage path name, which you should note is not the same as its user-configured name.
This an example tree of paths from a minimal Vault instance for purposes of illustration:
├── namespaces
│ ├── 5Gsx8
│ │ └── sys
│ │ ├── expire
│ │ │ └── id
│ │ │ ├── auth
│ │ │ │ └── approle
│ │ │ │ └── login
│ │ │ │ ├── _h77ec2e32d144d6d75873a8b2098996b14ee70ed18addf6a4f5dd99d678d28b0b.5Gsx8
│ │ │ │ ├── _h8fba78bf0d4e497eb662709bf9e0f1e1ae0b053ec7cb3dbd3bc45ec76c66b693.5Gsx8
│ │ │ │ ├── _hf15a2ad701c9ec86089a5eb4704d2fbbbcff1fe66d118d005107a7dc83fb21ec.5Gsx8
│ │ │ │ └── _hf2b6bbafc5db6125191e09cbc7eaee9a080953217df4a105601c1fbf8b8ce492.5Gsx8
│ │ │ └── database
│ │ │ └── creds
│ │ │ └── mongodb-example-namespace-readonly
│ │ │ ├── _HoNqW43Gy4N5Mbg3LzOEpuLa.5Gsx8
│ │ │ ├── _OKHi04a98CLO2CYMMXHwduNj.5Gsx8
│ │ │ ├── _YbfCjSZKwHUJeoAB31wIwn7f.5Gsx8
│ │ │ └── _nNcN5KNlzh47JwhmGVMIMvkV.5Gsx8
│ │ ├── policy
│ │ │ ├── _control-group
│ │ │ ├── _default
│ │ │ └── _response-wrapping
│ │ └── token
│ │ ├── _salt
│ │ ├── accessor
│ │ │ ├── _h61e29be9d79c1d597cefab1ae8242fc8dffa6cb72ea1514260f1250632401638
│ │ │ ├── _ha6ddbad7b6237b0de4c03faf9cb82ae87aca4a8912b9c4b09c9f6b00dbe7c32e
│ │ │ ├── _ha9d7d995ddc93c4e136d686fc24ef4ca9f28e4793358b49e68c7e70e13843815
│ │ │ └── _he05dc5ca4f1ba60b20e406d4215d4d469a2a26231fab8f5067a97893147d95ea
│ │ └── id
│ │ ├── _h77ec2e32d144d6d75873a8b2098996b14ee70ed18addf6a4f5dd99d678d28b0b
│ │ ├── _h8fba78bf0d4e497eb662709bf9e0f1e1ae0b053ec7cb3dbd3bc45ec76c66b693
│ │ ├── _hf15a2ad701c9ec86089a5eb4704d2fbbbcff1fe66d118d005107a7dc83fb21ec
│ │ └── _hf2b6bbafc5db6125191e09cbc7eaee9a080953217df4a105601c1fbf8b8ce492
│ └── _info
The actual user-configured namespace name is example-namespace
, but this is stored internally as a short unique identifier instead; in the case of our above example, this value is 5Gsx8.
Once you have determined the storage path for the namespace, you can then compose similar commands as those previously shown against the root namespace with your namespaces.
Here is an example of listing the leases for an existing AppRole auth method that is enabled at the default path approle
in the example-namespace_/_5Gsx8
namespace.
Use the vault list
in combination with jq
to count leases in the specified auth method.
$ vault list -format=json \
sys/raw/namespaces/5Gsx8/sys/expire/id/auth/approle/login \
| jq '. | length'
10
Likewise for a count of active tokens, the following example can be used as a starting point.
$ vault list -format=json sys/raw/namespaces/5Gsx8/sys/token/id/ \
| jq '. | length'
10
After inspecting data, you can move on to stopping the recovery mode server, starting it normally, and rejoining other servers which participate in the cluster.
»Stop recovery mode server and start normally
Starting Vault in recovery mode with Integrated Storage resets the cluster member list, effectively reducing cluster members to 1 as described in the documentation on Raft rejoin.
Once you have completed inspection of your Vault data, you can stop the recovery mode server and then start it again normally (i.e., without the -recovery
flag) so that you can rejoin the other cluster servers to it, and re-establish a highly available cluster.
When the Vault server is successfully started for normal operations, ensure that it is unsealed and active with vault status
or the /sys/seal-status API before joining the standby servers to it.
»Start and join standby servers
With one Vault server active and unsealed, you can now proceed to joining the standby servers. Please note that due to the cluster size change introduced when using recovery mode, you must ensure that the standby servers are free of all existing Vault data before you attempt to join them to the active, from which they will receive the updated data.
You have two choices for achieving this requirement.
- Join all new standby servers; this is most helpful if there are uncorrectable issues with the servers for example.
- Remove the contents of the path directory that holds data for each of the standby Vault servers before starting them.
Once you have decided and implemented a strategy, go ahead and start the standby Vault servers and join each of them to the active server.
NOTE: If the connection details of all Vault servers are known beforehand, you can configure retry_join
inside the storage
stanza to automatically join the cluster.
»Validate cluster
After joining all standby servers to the active server, you can validate your cluster health with vault operator raft list-peers
.
Example:
$ vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
vault-0 10.10.42.200:8201 leader true
vault-1 10.10.42.201:8201 follower true
vault-2 10.10.42.202:8201 follower true
vault-3 10.10.42.203:8201 follower true
vault-4 10.10.42.204:8201 follower true
Here in the example output, you can learn that all 5 of our servers are up and that vault-0
is the active leader.
»Summary
In this tutorial, you learned about how Vault stores its operational data in the Integrated Storage backend along with accessing these data through the /sys/raw
API endpoint with Vault operating in recovery mode.
You should be able to use what you have learned here as a starting point in inspecting and measuring the characteristics most important to you about your Vault data.