This guide applies to Vault versions 1.6 - 1.8.
This guide describes recommended best practices for infrastructure architects and operators to follow when deploying Vault using the Integrated Storage (Raft) storage backend in a production environment.
This guide includes general guidance as well as specific recommendations for popular cloud infrastructure platforms. These recommendations have also been encoded into official Terraform modules for AWS, Azure, and GCP.
NOTE: If you are deploying Vault to Kubernetes, please refer to the Vault on Kubernetes Reference Architecture.
The following diagram shows the recommended architecture for deploying a single Vault cluster with maximum resiliency:
With five nodes in the Vault cluster distributed between three availability zones, this architecture can withstand the loss of two nodes from within the cluster or the loss of an entire availability zone.
If deploying to three availability zones is not possible, the same architecture may be used across two or one availability zones, at the expense of significant reliability risk in case of an availability zone outage.
For Vault Enterprise customers, additional resiliency is possible by implementing a multi-cluster architecture, which allows for additional performance and disaster recovery options. See the Multi-Cluster Architecture Guide for more information.
This section contains specific hardware capacity recommendations, network requirements, and additional infrastructure considerations. Since every hosting environment is different and every customer's Vault usage profile is different, these recommendations should only serve as a starting point from which each customer's operations staff may observe and adjust to meet the unique needs of each deployment.
»Hardware sizing for Vault servers
Sizing recommendations have been divided into two common cluster sizes.
Small clusters would be appropriate for most initial production deployments or for development and testing environments.
Large clusters are production environments with a consistently high workload. That might be a large number of transactions, a large number of secrets, or a combination of the two.
|Size||CPU||Memory||Disk Capacity||Disk IO||Disk Throughput|
|Small||2-4 core||8-16 GB RAM||100+ GB||3000+ IOPS||75+ MB/s|
|Large||4-8 core||32-64 GB RAM||200+ GB||10000+ IOPS||250+ MB/s|
For each cluster size, the following table gives recommended hardware specs for each major cloud infrastructure provider.
|Provider||Size||Instance/VM Types||Disk Volume Specs|
NOTE: For GCP and Azure recommendations, the disk sizes listed are larger than the minimum size recommended, because for the recommended disk type, available IOPS increases with disk capacity, and the listed sizes are necessary to provision the required IOPS.
NOTE: For predictable performance on cloud providers, it's recommended to avoid "burstable" CPU and storage options (such as AWS
t3 instance types) whose performance may degrade rapidly under continuous load.
In general, CPU and storage performance requirements will depend on the customer's exact usage profile (eg, types of requests, average request rate, and peak request rate). Memory requirements depend on the total size of data stored in memory and should be sized according to that data.
When using integrated storage the Vault servers should have a relatively high-performance hard disk subsystem. If many secrets are being generated or rotated frequently, this information will need to flush to disk often and the use of slower storage systems will significantly impact performance.
In addition, Hashicorp strongly recommends configuring Vault with audit logging enabled. The impact of the additional storage I/O from audit logging will vary depending on your particular pattern of requests. For best performance, audit logs should be written to a separate disk.
»Network latency and bandwidth
In order for cluster members to stay properly in sync, network latency between availability zones should be less than eight milliseconds (8 ms).
The amount of network bandwidth used by Vault will depend entirely on the specific customer's usage patterns. In many cases, even a high request volume will not translate to a large amount of network bandwidth consumption. However, all data written to Vault will be replicated to all cluster members. It's also important to consider bandwidth requirements to other external systems such as monitoring and logging collectors. And finally, a multi-cluster Vault setup will require Vault datasets to be transmitted between clusters to provide Performance and DR Replication.
The following table outlines the network connectivity requirements for Vault cluster nodes. If general network egress is restricted, particular attention must be paid to granting outgoing access from the Vault servers to any external integration providers (for example, authentication and secret provider backends) as well as external log handlers, metrics collection, security and config management providers, and backup and restore systems.
|Client machines||Load balancer||443||tcp||incoming||Request distribution|
|Load balancer||Vault servers||8200||tcp||incoming||Vault API|
|Vault servers||Vault servers||8200||tcp||bidirectional||Cluster bootstrapping|
|Vault servers||Vault servers||8201||tcp||bidirectional||Raft, replication, request forwarding|
|Vault servers||External systems||various||various||various||External APIs|
»Network traffic encryption
All Vault-related network traffic should be encrypted along every segment. From client machines to the load balancer, and from the load balancer to the Vault servers, standard HTTPS TLS encryption can be used.
For communication between Vault servers (port 8201 by default) including Raft gossip, data replication, and request forwarding traffic, Vault automatically negotiates an mTLS connection when new servers join the cluster initially via the API address port (8200 by default).
»Load balancer recommendations
For the highest levels of reliability and stability, it is highly recommended to use some load balancing technology to distribute requests to your Vault cluster members. Each major cloud platform provides good options for managed load balancer services, or there are a number of self-hosted options as well as service discovery systems like Consul.
If you choose to terminate TLS at your load balancer, it is also strongly recommended to use TLS for the connection from the load balancer to Vault as well to minimize the exposure of secret content on your network.
To monitor the health of Vault cluster nodes, the load balancer should be
configured to poll the
/v1/sys/health API endpoint to detect the status
of the node and direct traffic accordingly. Refer to the
sys/health API documentation
for specific details on the query options and response codes and their meanings.
As of Vault 1.7, in a cloud-based environment, it is recommended to use a managed scaling service (such as Auto Scaling Groups on AWS) to keep your Vault cluster populated with healthy instances. However, because of the nature of the Integrated Storage backend, it's important not to replace all instances in the managed scaling group too quickly to avoid having to restore data from a snapshot.
NOTE: Auto-server cleanup is not enabled by default when using Integrated Storage. The feature must be enabled after cluster initialization via the Raft Autopilot API. Also see the Integrated Storage Autopilot Tutorial for more details.
For scaling the performance of your Vault cluster, there are two factors to consider. Adding additional members to the Vault cluster will not increase performance for any activity that triggers writes to the Vault storage backend. However, for Vault Enterprise customers, adding performance standby nodes can provide horizontal scalability for read requests within a Vault cluster.
»Failure tolerance characteristics
When deploying a Vault cluster, it's important to consider and design for your specific requirements for various failure scenarios:
The Integrated Storage backend for Vault allows for individual node failure by replicating all data between each node of the cluster. If the leader node fails, the remaining cluster members will elect a new leader following the Raft protocol. To allow for the failure of up to two nodes in the cluster, the ideal size is five nodes for a Vault cluster using Integrated Storage.
»Availability zone failure
By deploying a Vault cluster in the recommended architecture across three availability zones, the Raft consensus algorithm should be able to maintain consistency and availability given the failure of any one availability zone.
In cases where deployment across three zones is not possible, the failure of an availability zone may cause the Vault cluster to become inaccessible or unable to elect a leader. In a two availability zone deployment, for example, the failure of one availability zone would have a 50% chance of causing a cluster to lose its Raft quorum and be unable to service requests.
»Region or cluster failure
In the event of a failure of an entire region or cluster, Vault Enterprise provides replication features that can help provide resiliency across multiple clusters and/or regions. Please see the Multi-Cluster Architecture Guide for more information.
»External token storage
The Tokenization transformation feature reached General Availability in Vault 1.7. This feature introduces additional architectural considerations.
The tokenization feature requires an external data store to facilitate the mapping of tokens to cryptographic values. Be sure to architect your external data stores for high availability. Where possible, it's important to follow reliability and disaster-recovery architectural patterns that meet the same requirements you have for Vault itself. And in order to ensure data consistency the external data store backup cadence must be in sync with backups of Vault.
A Vault cluster is a set of Vault processes that together run a Vault service. These Vault processes could be running on physical or virtual servers or in containers.
An availability zone is a single network failure domain that hosts part or all of a Vault cluster. Examples of availability zones include:
- An isolated datacenter
- An isolated cage in a datacenter if it is isolated from other cages by all other means (power, network, etc)
- An "Availability Zone" in AWS or Azure; A "Zone" in GCP
A region is a collection of one or more availability zones on a low-latency network. Regions are typically separated by significant distances. A region could host one or more Vault clusters, but a single Vault cluster would not be spread across multiple regions due to network latency issues.
Autoscaling is the process of automatically scaling computational resources based on service activity. Autoscaling may be either horizontal, meaning to add more machines into the pool of resources, or vertical, meaning to increase the capacity of existing machines.
Each major cloud provider offers a managed autoscaling service:
|Cloud||Managed Autoscaling Service|
|AWS||Auto Scaling Groups|
|Azure||Virtual Machine Scale Sets|
|GCP||Managed Instance Groups|
A load balancer is a system that distributes network requests across multiple servers. It may be a managed service from a cloud provider, a physical network appliance, a piece of software, or a service discovery platform such as Consul.
Each major cloud provider offers one or more managed load balancing services:
|Cloud||Layer||Managed Load Balancing Service|
|AWS||Layer 4||Network Load Balancer|
|Layer 7||Application Load Balancer|
|Azure||Layer 4||Azure Load Balancer|
|Layer 7||Azure Application Gateway|
|GCP||Layer 4/7||Cloud Load Balancing|