Production installations of Vault typically operate with one or more enabled audit devices which log details of all external requests and responses for the specific purpose of performing audits on those data.
NOTE: audit device logs are a separate and unrelated concept from Vault's operational logs, which are typically gathered by the operating system journal from standard output and standard error while Vault is running and contain a different set of information.
You can enable an audit device for output to a variety of destinations including static files, TCP, UDP, or Unix sockets, and syslog. Regardless of the configured destination, the resulting output is always in JSON format.
More details including an example audit device log entry can be found in the Troubleshooting Vault tutorial.
»Challenge
Audit device logs provide great details for troubleshooting, such as counts for operations against a specific Vault endpoint or the IP addresses of hosts responsible for generating specific errors.
These logs can be aggregated into a solution for human consumption with dashboards and querying, but without access to this solution it can be useful to have an alternative means of querying the logs in an ad-hoc manner during troubleshooting scenarios.
»Solution
The log file from an enabled File Audit Device can be queried in an ad-hoc manner using a common command line tool. Important details which could help resolve the troubleshooting scenario can still be accessed from the audit device log when other solutions for querying aggregated logs are unavailable.
»Notes and prerequisites
This tutorial explains how to query the logs written by a File Audit Device using the popular command line utility jq for data points found only in these logs which can be useful in troubleshooting.
To follow along with this tutorial you need the following.
- Download the example log file from vault-guides
- Install jq if not already installed
NOTE: This tutorial is written with examples for the Linux operating system, but the same principles can be applied in other operating systems and jq
is a cross platform tool so these examples will work on other platforms as well.
»Example log file preparation
Whenever an audit device log filename is referenced in examples, it will appear as $AUDIT_LOG_FILE
; you should replace this value with the actual filename of the audit device log from the vault-guides repository so that examples work as-is.
First, clone the repository.
$ git clone https://github.com/hashicorp/vault-guides
Then export the AUDIT_LOG_FILE
environment variable to the example log file path.
$ export AUDIT_LOG_FILE=vault-guides/monitoring-troubleshooting/vault-audit.log
You are now prepared to try the examples shown in this tutorial.
»All errors and their timestamps
To begin with a simple example, you can list all non-null error
fields along with their corresponding timestamps to quickly gain insight into the volume and type of errors logged by the audit device.
$ jq 'select(.error != null) | [.time,.error]' $AUDIT_LOG_FILE
If this command returns nothing, then there are no errors present in the log file, otherwise results would resemble this example.
[
"2020-04-30T15:12:20.8285315Z",
"permission denied"
]
[
"2020-04-30T15:13:58.6176895Z",
"missing client token"
]
[
"2020-04-30T15:16:02.9638134Z",
"1 error occurred:\n\t* unsupported path\n\n"
]
[
"2020-04-30T19:12:55.8372206Z",
"permission denied"
]
[
"2020-04-30T19:12:55.8400091Z",
"1 error occurred:\n\t* permission denied\n\n"
]
[
"2020-04-30T19:12:58.5621393Z",
"permission denied"
]
[
"2020-04-30T19:12:58.5648483Z",
"1 error occurred:\n\t* permission denied\n\n"
]
In this simple example, you can observe that there are just a few errors which break down as follows:
- "permission denied": The client token used in the request lacks the required capabilities for the requested endpoint; this is perhaps the most common error you will encounter in audit device logs
- "missing client token": A request to an authenticated endpoint was made without providing a client token
- "unsupported path": A request was made against an auth method or secrets engine using an unsupported endpoint path (likely due to typo)
TIP: While this is just a simple and compact example of errors for the purpose of this tutorial, you can expect to find a much larger volume of more varied errors in a heavily used Vault environment.
»HMAC hashed errors and their timestamps
Sensitive information, including details returned in errors are all hashed with a salt using HMAC-SHA256 according to the Sensitive Information section of the Audit Devices documentation.
To locate these hashed errors and their timestamps, use a query like this example.
$ jq 'select(.response.data.error != null) | [.time,.response.data.error]' \
$AUDIT_LOG_FILE
The results show a few HMAC hashed errors, which appear as the hash algorithm type and the actual hashed information separated by a colon.
[
"2020-04-30T15:16:02.9638134Z",
"hmac-sha256:31b7804ad92159b62320f3387a53c881b1f5f8932845067a797e6722b7f54547"
]
[
"2020-04-30T15:18:01.8190422Z",
"hmac-sha256:8c436a490d2dd8e2410c5a67d2e2663a09f2e0e861cb4dbf6c224d02cc84f2e3"
]
[
"2020-04-30T15:22:42.4340489Z",
"hmac-sha256:8c436a490d2dd8e2410c5a67d2e2663a09f2e0e861cb4dbf6c224d02cc84f2e3"
]
[
"2020-04-30T19:12:55.8400091Z",
"hmac-sha256:a8578738d9cdb1fe85ef7e6dc2f4f1a9fe4a3743ef9acd99be4e57622f7dae8d"
]
[
"2020-04-30T19:12:58.5648483Z",
"hmac-sha256:a8578738d9cdb1fe85ef7e6dc2f4f1a9fe4a3743ef9acd99be4e57622f7dae8d"
]
You can use the hashed values compared against previously known error payloads to find matches.
For a simple example, suppose that you had a number of AppRole auth method login failures due to the error "invalid secret id", these would be included in the .response.data.error
field and would be hashed.
If you want to find the corresponding entries in the audit device log, you can use the /sys/audit-hash API to compare a known value with the hashes to find matches. Be sure to review the API documentation for more details.
»HMAC hash calculation example
NOTE: The HMAC used by an audit device is unique to that device. The example is provided here only for reference to the process involved to calculate a hash using the sys/audit-hash
API.
From the previous example output, note that the second and third entries have the same hash value (hmac-sha256:8c436a490d2dd8e2410c5a67d2e2663a09f2e0e861cb4dbf6c224d02cc84f2e3).
Assuming that a file audit device is enabled at the file
path, here is an example vault
command to compare this hash value with the string "invalid secret id" to learn if the two match.
$ vault write sys/audit-hash/file input="invalid secret id"
The command writes to the API and specifies the file audit device as the last part of the path to select the correct audit device for calculating hashes against, and provide the string "invalid secret id" as the value of input
.
Successful response:
Key Value
--- -----
hash hmac-sha256:8c436a490d2dd8e2410c5a67d2e2663a09f2e0e861cb4dbf6c224d02cc84f2e3
In the case of this example there is a match. You can conclude that the last 2 of the 3 example HMAC hashed error lines indeed contain the error "invalid secret id".
Again, you will need to use this technique on logs for which you still have access to the audit device that wrote them to successfully calculate hashes.
»Counts and specific details
Sometimes being able to group related items from the audit device logs by their count is helpful to spot outliers and other problems.
»Count all requests and responses
You can count the occurrences of requests and responses like this.
$ jq -n '[inputs | {Operation: .type}] | group_by(.Operation) | map({Operation: .[0].Operation, Count: length}) | .[]' $AUDIT_LOG_FILE
{
"Operation": "request",
"Count": 698
}
{
"Operation": "response",
"Count": 699
}
This combined number of requests and responses (1397) should equal the total lines in the file as shown by counting lines for example, with the wc
command.
$ wc -l $AUDIT_LOG_FILE
Successful output:
1397 vault-audit.log
»Response display names
To break out the authentication display name counts for responses, you can use something like this example query which sets up a map of display names and counts where the values are not null.
$ jq -n '[inputs | {DisplayName: .auth.display_name | select(. != null)} ] | group_by(.DisplayName) | map({DisplayName: .[0].DisplayName, Count: length}) | .[]' $AUDIT_LOG_FILE
Successful output:
{
"DisplayName": "approle",
"Count": 8
}
{
"DisplayName": "root",
"Count": 21
}
{
"DisplayName": "token",
"Count": 453
}
{
"DisplayName": "token-lab-admins",
"Count": 423
}
{
"DisplayName": "userpass-lab-user-1",
"Count": 11
}
{
"DisplayName": "userpass-lab-user-3",
"Count": 74
}
{
"DisplayName": "userpass-lab-user-4",
"Count": 1
}
{
"DisplayName": "userpass-lab-user-5",
"Count": 33
}
{
"DisplayName": "userpass-lab-user-7",
"Count": 144
}
It is evident from this output that the busiest display names accessing this Vault are token
, token-lab-admins
and userpass-lab-user-7
.
»Request operations
This query will break out all request operation
types by count.
$ jq -n '[inputs | {Operation: .request.operation} ] | group_by(.Operation) | map({Operation: .[0].Operation, Count: length}) | .[]' $AUDIT_LOG_FILE
Successful output:
{
"Operation": "create",
"Count": 274
}
{
"Operation": "delete",
"Count": 20
}
{
"Operation": "list",
"Count": 184
}
{
"Operation": "read",
"Count": 346
}
{
"Operation": "update",
"Count": 573
}
From this output, you can observe that most operations against this Vault server during the time period covered by the audit device logs are update
and read
, followed by create
, and list
, but with very few delete
operations occurring during the same time period.
»Request paths
Learning about hot API endpoints by counting their requests for access can be helpful for troubleshooting. Let's take a look at the top 5 most busy endpoints based on their request counts.
$ jq -n '[inputs | {Path: .request.path} ] | group_by(.Path) | map({Path: .[0].Path, Count: length}) | sort_by(-.Count) | limit(5;.[])' $AUDIT_LOG_FILE
Successful output:
{
"Path": "auth/userpass/login/lab-user-7",
"Count": 204
}
{
"Path": "sys/internal/ui/mounts/kv-v2",
"Count": 180
}
{
"Path": "kv-v2/metadata/",
"Count": 178
}
{
"Path": "auth/userpass/login/lab-user-3",
"Count": 148
}
{
"Path": "auth/userpass/login/lab-user-5",
"Count": 66
}
From this output you know that the hottest path represented by these logs is auth/userpass/login/lab-user-7
with 204 requests. This is representative of the lab-user-7
authenticating with the username and password auth method.
»Errors by count
You can query for errors and get their counts like this.
$ jq -n '[inputs | {Errors: .error} ] | group_by(.Errors) | map({Errors: .[0].Errors, Count: length}) | sort_by(-.Count) | .[]' $AUDIT_LOG_FILE
Successful output:
{
"Errors": null,
"Count": 1390
}
{
"Errors": "permission denied",
"Count": 3
}
{
"Errors": "1 error occurred:\n\t* permission denied\n\n",
"Count": 2
}
{
"Errors": "1 error occurred:\n\t* unsupported path\n\n",
"Count": 1
}
{
"Errors": "missing client token",
"Count": 1
}
Of note here, there are 1386 successful requests (where the error value is null) and 7 error occurrences. The "permission denied" errors are the result of using a token with insufficient capabilities to access and endpoint and you can view the complete request and response example to learn more about them.
The errors containing an asterisk (*
) character originate from the logical backend (auth method or secrets engine).
The first, "permission denied" is a permission denied error resulting from an attempt to access a list of enabled secrets engines. The second, "unsupported path" is from an attempt to list an unsupported endpoint in the AppRole auth method.
Finally, the "missing client token" resulted from an attempt to access an authenticated endpoint without providing a valid client token.
»Remote address by count
It can be handy to know the request frequency by the value of the remote_address
field in situations where inexplicable activity is occurring at a high volume, for example. This can help to identify the responsible host(s) for correction or mitigation of the issue.
$ jq -n '[inputs | {RemoteAddress: .request.remote_address} ] | group_by(.RemoteAddress) | map({RemoteAddress: .[0].RemoteAddress, Count: length}) | .[]' $AUDIT_LOG_FILE
Successful output:
{
"RemoteAddress": "10.10.42.210",
"Count": 4
}
{
"RemoteAddress": "10.10.42.212",
"Count": 2
}
{
"RemoteAddress": "10.10.42.213",
"Count": 2
}
{
"RemoteAddress": "10.10.42.222",
"Count": 1389
}
In the above example, a clear outlier is 10.10.42.222
, as it has generated the majority of requests.
»Path access by remote address
For a more complex query that breaks down API path access by count over each remote address, try this example. It produces a great deal of output, but you can use what you have learned here to modify it to return only the top 5 hottest paths for each host, for example.
$ jq -s 'group_by(.request.remote_address) | map({"remote_address": .[0].request.remote_address,"access": (group_by(.request.path) | map({"key":.[0].request.path,"value":length}) | from_entries)})' $AUDIT_LOG_FILE
Successful output:
[
{
"remote_address": "10.10.42.210",
"access": {
"auth/userpass/login/lab-user-1": 2,
"sys/internal/ui/mounts/kv-v2/lab": 2
}
},
{
"remote_address": "10.10.42.212",
"access": {
"auth/userpass/login/lab-user-5": 2
}
},
{
"remote_address": "10.10.42.213",
"access": {
"auth/userpass/login/lab-user-4": 2
}
},
{
"remote_address": "10.10.42.222",
"access": {
"approle/role/": 2,
"auth/approle/login": 20,
"auth/approle/role/my-role": 2,
"auth/approle/role/my-role/role-id": 4,
"auth/approle/role/my-role/secret-id": 4,
"auth/token/create": 2,
"auth/token/create/admins": 2,
"auth/token/lookup-self": 6,
"auth/token/roles/admins": 2,
"auth/userpass/login/lab-user-1": 20,
"auth/userpass/login/lab-user-3": 148,
"auth/userpass/login/lab-user-5": 64,
"auth/userpass/login/lab-user-7": 204,
"auth/userpass/users/lab-user-1": 2,
"auth/userpass/users/lab-user-2": 2,
"auth/userpass/users/lab-user-3": 2,
"auth/userpass/users/lab-user-4": 2,
"auth/userpass/users/lab-user-5": 2,
"auth/userpass/users/lab-user-6": 2,
"auth/userpass/users/lab-user-7": 2,
"cubbyhole/my-secret-1": 2,
"cubbyhole/my-secret-3": 6,
"kv-v2/data/lab-admin-1": 2,
"kv-v2/data/lab/lab-credential-1": 8,
"kv-v2/data/lab/lab-credential-2": 2,
"kv-v2/data/lab/user7-my-secret-3": 6,
"kv-v2/data/mysecret-1": 2,
"kv-v2/data/mysecret-2": 2,
"kv-v2/metadata/": 178,
"kv-v2/metadata/lab/": 2,
"kv/": 2,
"kv/event/event-1": 4,
"kv/event/event-10": 4,
"kv/event/event-2": 4,
"kv/event/event-3": 4,
"kv/event/event-4": 4,
"kv/event/event-5": 4,
"kv/event/event-6": 4,
"kv/event/event-7": 4,
"kv/event/event-8": 4,
"kv/event/event-9": 4,
"kv/lab/user7-my-secret-3": 6,
"kv/my-secret-1": 2,
"kv/my-secret-10": 2,
"kv/my-secret-2": 2,
"kv/my-secret-3": 2,
"kv/my-secret-4": 2,
"kv/my-secret-5": 2,
"kv/my-secret-6": 2,
"kv/my-secret-7": 2,
"kv/my-secret-8": 2,
"kv/my-secret-9": 2,
"pki/config/urls": 2,
"pki/roles/example-dot-com": 2,
"pki/root/generate/internal": 2,
"pki/root/sign-intermediate": 2,
"pki_int/config/urls": 2,
"pki_int/intermediate/generate/internal": 4,
"pki_int/intermediate/set-signed": 2,
"pki_int/issue/example-dot-com": 40,
"pki_int/roles/example-dot-com": 2,
"sys/audit/file": 1,
"sys/auth": 2,
"sys/auth/approle": 2,
"sys/auth/userpass": 2,
"sys/internal/ui/mounts/kv": 2,
"sys/internal/ui/mounts/kv-v2": 180,
"sys/internal/ui/mounts/kv-v2/lab": 2,
"sys/internal/ui/mounts/kv-v2/lab-admin-1": 2,
"sys/internal/ui/mounts/kv-v2/lab/lab-credential-1": 8,
"sys/internal/ui/mounts/kv-v2/lab/lab-credential-2": 2,
"sys/internal/ui/mounts/kv-v2/lab/user7-my-secret-3": 6,
"sys/internal/ui/mounts/kv-v2/mysecret-1": 2,
"sys/internal/ui/mounts/kv-v2/mysecret-2": 2,
"sys/internal/ui/mounts/kv/event/event-1": 4,
"sys/internal/ui/mounts/kv/event/event-10": 4,
"sys/internal/ui/mounts/kv/event/event-2": 4,
"sys/internal/ui/mounts/kv/event/event-3": 4,
"sys/internal/ui/mounts/kv/event/event-4": 4,
"sys/internal/ui/mounts/kv/event/event-5": 4,
"sys/internal/ui/mounts/kv/event/event-6": 4,
"sys/internal/ui/mounts/kv/event/event-7": 4,
"sys/internal/ui/mounts/kv/event/event-8": 4,
"sys/internal/ui/mounts/kv/event/event-9": 4,
"sys/internal/ui/mounts/kv/lab/user7-my-secret-3": 6,
"sys/internal/ui/mounts/kv/my-secret-1": 2,
"sys/internal/ui/mounts/kv/my-secret-10": 2,
"sys/internal/ui/mounts/kv/my-secret-2": 2,
"sys/internal/ui/mounts/kv/my-secret-3": 2,
"sys/internal/ui/mounts/kv/my-secret-4": 2,
"sys/internal/ui/mounts/kv/my-secret-5": 2,
"sys/internal/ui/mounts/kv/my-secret-6": 2,
"sys/internal/ui/mounts/kv/my-secret-7": 2,
"sys/internal/ui/mounts/kv/my-secret-8": 2,
"sys/internal/ui/mounts/kv/my-secret-9": 2,
"sys/mounts": 4,
"sys/mounts/kv": 2,
"sys/mounts/kv-v2": 2,
"sys/mounts/pki": 2,
"sys/mounts/pki/tune": 2,
"sys/mounts/pki_int": 2,
"sys/mounts/pki_int/tune": 2,
"sys/mounts/totp": 2,
"sys/mounts/transit": 2,
"sys/policies/acl/admins": 2,
"sys/policies/acl/lab-user": 2,
"totp/code/my-key": 50,
"totp/keys/my-key": 2,
"totp/keys/my-user": 2,
"transit/encrypt/my-key-1": 2,
"transit/encrypt/my-key-10": 2,
"transit/encrypt/my-key-100": 2,
"transit/encrypt/my-key-11": 2,
"transit/encrypt/my-key-12": 2,
"transit/encrypt/my-key-13": 2,
"transit/encrypt/my-key-14": 2,
"transit/encrypt/my-key-15": 2,
"transit/encrypt/my-key-16": 2,
"transit/encrypt/my-key-17": 2,
"transit/encrypt/my-key-18": 2,
"transit/encrypt/my-key-19": 2,
"transit/encrypt/my-key-2": 2,
"transit/encrypt/my-key-20": 2,
"transit/encrypt/my-key-21": 2,
"transit/encrypt/my-key-22": 2,
"transit/encrypt/my-key-23": 2,
"transit/encrypt/my-key-24": 2,
"transit/encrypt/my-key-25": 2,
"transit/encrypt/my-key-26": 2,
"transit/encrypt/my-key-27": 2,
"transit/encrypt/my-key-28": 2,
"transit/encrypt/my-key-29": 2,
"transit/encrypt/my-key-3": 2,
"transit/encrypt/my-key-30": 2,
"transit/encrypt/my-key-31": 2,
"transit/encrypt/my-key-32": 2,
"transit/encrypt/my-key-33": 2,
"transit/encrypt/my-key-34": 2,
"transit/encrypt/my-key-35": 2,
"transit/encrypt/my-key-36": 2,
"transit/encrypt/my-key-37": 2,
"transit/encrypt/my-key-38": 2,
"transit/encrypt/my-key-39": 2,
"transit/encrypt/my-key-4": 2,
"transit/encrypt/my-key-40": 2,
"transit/encrypt/my-key-41": 2,
"transit/encrypt/my-key-42": 2,
"transit/encrypt/my-key-43": 2,
"transit/encrypt/my-key-44": 2,
"transit/encrypt/my-key-45": 2,
"transit/encrypt/my-key-46": 2,
"transit/encrypt/my-key-47": 2,
"transit/encrypt/my-key-48": 2,
"transit/encrypt/my-key-49": 2,
"transit/encrypt/my-key-5": 2,
"transit/encrypt/my-key-50": 2,
"transit/encrypt/my-key-51": 2,
"transit/encrypt/my-key-52": 2,
"transit/encrypt/my-key-53": 2,
"transit/encrypt/my-key-54": 2,
"transit/encrypt/my-key-55": 2,
"transit/encrypt/my-key-56": 2,
"transit/encrypt/my-key-57": 2,
"transit/encrypt/my-key-58": 2,
"transit/encrypt/my-key-59": 2,
"transit/encrypt/my-key-6": 2,
"transit/encrypt/my-key-60": 2,
"transit/encrypt/my-key-61": 2,
"transit/encrypt/my-key-62": 2,
"transit/encrypt/my-key-63": 2,
"transit/encrypt/my-key-64": 2,
"transit/encrypt/my-key-65": 2,
"transit/encrypt/my-key-66": 2,
"transit/encrypt/my-key-67": 2,
"transit/encrypt/my-key-68": 2,
"transit/encrypt/my-key-69": 2,
"transit/encrypt/my-key-7": 2,
"transit/encrypt/my-key-70": 2,
"transit/encrypt/my-key-71": 2,
"transit/encrypt/my-key-72": 2,
"transit/encrypt/my-key-73": 2,
"transit/encrypt/my-key-74": 2,
"transit/encrypt/my-key-75": 2,
"transit/encrypt/my-key-76": 2,
"transit/encrypt/my-key-77": 2,
"transit/encrypt/my-key-78": 2,
"transit/encrypt/my-key-79": 2,
"transit/encrypt/my-key-8": 2,
"transit/encrypt/my-key-80": 2,
"transit/encrypt/my-key-81": 2,
"transit/encrypt/my-key-82": 2,
"transit/encrypt/my-key-83": 2,
"transit/encrypt/my-key-84": 2,
"transit/encrypt/my-key-85": 2,
"transit/encrypt/my-key-86": 2,
"transit/encrypt/my-key-87": 2,
"transit/encrypt/my-key-88": 2,
"transit/encrypt/my-key-89": 2,
"transit/encrypt/my-key-9": 2,
"transit/encrypt/my-key-90": 2,
"transit/encrypt/my-key-91": 2,
"transit/encrypt/my-key-92": 2,
"transit/encrypt/my-key-93": 2,
"transit/encrypt/my-key-94": 2,
"transit/encrypt/my-key-95": 2,
"transit/encrypt/my-key-96": 2,
"transit/encrypt/my-key-97": 2,
"transit/encrypt/my-key-98": 2,
"transit/encrypt/my-key-99": 2,
"transit/keys/my-key-1": 2
}
}
]
»Tips
- Feel free to browse the example log file in your favorite JSON exploration tool to get an idea of the shape and structure of audit device data, the fields contained, and so on. It will help you later to quickly identify key areas when interpreting data from the logs.
- Explore the jq manual to learn about building more advanced and specific queries
»Summary
In this tutorial, you learned how to use a common command-line utility to query a file audit device log. You can use what you have learned here to expand your understanding of the Vault audit device logs and as a tool for troubleshooting scenarios.