The goal of this document is to recommend HashiCorp Vault deployment practices. This reference architecture conveys a general architecture, leveraging the raft storage backend, that should be adapted to accommodate the specific needs of each implementation.
Using raft as a storage backend eliminates reliance on any third party systems, it implements high availability (HA), supports Enterprise Replication features, and provides backup/restore workflows.
The following topics are addressed in this guide:
- Design Summary
- Recommended Architecture
- Best Case Architecture
- Vault Replication (Enterprise)
- Deployment System Requirements
- Load Balancing
- External Token Storage
- Additional References
A Vault cluster is a set of Vault processes that together run a Vault service. These Vault processes could be running on physical or virtual servers, or in containers.
A single failure domain on a location level that hosts part of, or all of a Vault cluster. The latency between availability zones should be less than 8 milliseconds for a round trip. A single Vault cluster may be spread across multiple availability zones. Examples of an availability zone in this context are:
- An isolated datacenter
- An isolated cage in a datacenter if it is isolated from other cages by all other means (power, network, etc)
- An availability zone in AWS, Azure or GCP
A geographically separate collection of one or more availability zones. A region would host one or more Vault clusters. There is no defined maximum latency requirement between regions in Vault architecture. A single Vault cluster would not be spread across multiple regions.
This design is the recommended architecture for production environments, as it provides flexibility and resilience. When using the Raft storage backend, decisions on resilience for Vault are modified to account for the necessity of maintaining quorum for the Raft protocol.
»Network Connectivity Details
Network Connectivity Details
The following table outlines the network traffic requirements for Vault cluster nodes.
|Vault clients||Vault servers||8200||tcp||incoming||Vault API|
|Vault servers||Vault servers||8201||tcp||bidirectional||Vault replication traffic, request forwarding|
|Vault servers||Vault servers||8201||tcp||bidirectional||Raft gossip traffic|
Vault is designed to handle different failure scenarios that have different probabilities. When deploying a Vault cluster, the failure tolerance that you require should be considered and designed for. In OSS Vault, the recommended number of instances is 5 in a cluster as any more would have limited value.
In Vault Enterprise, the recommended number is also 5 in a cluster, but more can be used if they were performance standbys to help with the read-only workload. When leveraging this feature, it is also advisable to configure the performance standby nodes as a non-voting node.
NOTE: For versions prior to 1.7 - When recovering from failure conditions, if nodes are replaced as opposed to repaired, it is necessary to use the
remove-peer endpoint to remove the node from the cluster. This action is taken to protect the quorum requirements of the cluster by removing the node no longer in use from the cluster list proactively. This is also advisable for upgrade procedures where nodes are replaced instead of upgraded in place.
Versions 1.7 and later support autopilot, which handles dead server cleanup, making a call 'remove-peer' unecessary.
Beginning with version 1.7, Vault Integrated Storage now supports autopilot. Autopilot has the following included features:
- Server Stabilization: If a newly added node to the raft cluster doesn't perform well, it can disrupt the stability of the raft quorum. Therefore, when a new node is added, autopilot will initially add it as a non-voter node. If the new node remains healthy during the stabilization period and passes all the health checks, it will then be promoted to a voting node. A new node that is unstable will not be able to affect the raft quorum.
- Dead Server Cleanup: Autopilot will periodically clean up failed servers from raft configuration. This includes voter and non-voter nodes that are determined to be dead.
- Health Awareness: Determining the health of the cluster nodes based on autopilot measures (e.g. trailing logs and last contact time) of all the nodes in the cluster via the API.
Refer to the Raft configuration documentation for specifics on configuration.
NOTE: Vault autopilot will not promote a non-voter to a voter in case of voter failure. This is a feature in Consul Redundancy Zones, which is not yet implemented in Vault with Integrated Storage (Raft)
The Vault software allows for a failure domain at the node level by having replication within the cluster. In a single HA Vault cluster, all nodes share the same underlying storage backend and therefore data. Vault achieves this by one of the Vault servers obtaining a lock within the data store to become the active Vault node and this has the write access. If at any time the leader is lost, then another Vault node will seamlessly take its place as the leader. To achieve n-2 redundancy (where the loss of 2 objects within the failure domain can be tolerated), an ideal size for a Vault cluster leveraging raft storage would be 5.
Typical distribution in a cloud environment is to spread Vault nodes into separate Availability Zones (AZs) within a high bandwidth, low latency network, such as an AWS Region; however, this may not be possible in a datacenter (DC) installation where there is only one DC within the level of latency required. It is important to understand a change in requirements or best practices that have come about as a result of the move towards greater utilization of highly distributed systems such as Raft. When operating environments comprised of distributed systems, a shift is required in the redundancy coefficient of underlying components. Raft relies upon consensus negotiation to organize and replicate information and so the environment must provide 3 unique resilient paths in order to provide meaningful reliability. Essentially, a consensus system requires a simple majority of nodes to be available at any time. In the example of 3 nodes, you must have 2 available. If those 3 nodes are placed in two failure domains, there is a 50% chance that losing a single failure domain would result in a complete outage.
To protect against failure at the region level, as well as provide additional geographic scaling capabilities, Vault enterprise offers:
- Disaster Recovery (DR) Replication
- Performance Replication
Please see the Recommended Patterns on Vault Replication for a full description of these options.
Because of the constraints listed above, the recommended architecture for Vault is to distribute nodes across three availability zones within a cluster and for clusters to be replicated across regions using DR and Performance replication. There are also several “Best Case” architecture solutions for one and two Availability Zones. These are not the recommended architecture but are the best solutions if your deployment is restricted by the number of availability zones.
The architecture below is the recommended best approach to deployment of a single Vault cluster and should be the target architecture for any installation.
»Deployment of Vault in three Availability Zones
In this scenario, the nodes in the Vault cluster are distributed between three Availability Zones. This solution has an n-2 at the node level for Vault at the region level. This also has an n-1 at the Availability Zone level and as such is considered the most resilient of all architectures for a single Vault cluster with integrated storage backend for the OSS product.
»Multiple Region Deployment (Enterprise)
The recommended architecture for multiple Vault clusters to allow for regional, performance and disaster recovery remains the same as what is described in our standard Recommended Architecture guide (with the exception that you would not need to deploy any Consul clusters).
»Best Case Architecture
In some deployments, there may be insurmountable restrictions that mean the recommended architecture is not possible. This could be due to a lack of availability zones, as an example. In these cases, the architectures below detail the best case options available.
NOTE: In the following architectures, the Raft leader could be any of the five Raft server nodes.
»Deployment of Vault in one Availability Zone
In this scenario, all nodes in the Vault cluster are hosted within one Availability Zone. This solution has a single point of failure at the availability zone level, but an n-2 at the node level for Vault. This is not HashiCorp's recommended architecture for production systems since there is no redundancy at the Availability Zone level. Also, there is no DR capability and so at a minimum this should have a DR replica in a separate region.
»Deployment of Vault in two Availability Zones
In this scenario, the nodes in the Vault cluster are hosted between two Availability Zones. This solution has an n-2 at the node level and n-1 at the Availability Zone level, but the addition of an Availability Zone does not significantly increase the availability of the Vault cluster. This is because the Raft protocol requires a quorum of (n/2)+1 and if Zone A were to fail in the above diagram, then the cluster would not be quorate and so would also fail. This is not HashiCorp's recommended architecture for production systems since there is only partial redundancy at the Availability Zone level and an Availability Zone failure may or may not result in an outage.
»Vault Replication (Enterprise)
In these architectures, the "Vault Cluster" is illustrated as a single entity, and would be one of the single clusters detailed above based on your number of Availability Zones. Multiple Vault clusters acting as a single Vault solution and replicating between them is available in Vault Enterprise only. OSS Vault can be set up in multiple clusters, but they would each be individual Vault solutions and would not support replication between clusters. The Vault documentation provides more detailed information on the replication capabilities within Vault Enterprise.
Vault performance replication allows for secrets management across many sites. Static secrets, authentication methods, and authorization policies are replicated to be active and available in multiple locations; however, leases, tokens and dynamic secrets are not.
NOTE: Refer to the Vault Mount Filter tutorial about filtering out secret engines from being replicated across regions.
»Disaster Recovery Replication
Vault disaster recovery replication ensures that a standby Vault cluster is kept synchronized with an active Vault cluster. This mode of replication includes data such as ephemeral authentication tokens, time-based token information as well as token usage data. This provides for aggressive recovery point objective in environments where preventing loss of ephemeral operational data is of the utmost concern. In any enterprise solution, Disaster Recovery Replicas are considered essential.
NOTE: Refer to the Vault Disaster Recovery Setup tutorial for additional information.
»Corruption or Sabotage Disaster Recovery
Another common scenario to protect against, more prevalent in cloud environments that provide very high levels of intrinsic resiliency, might be the purposeful or accidental corruption of data and configuration, and or a loss of cloud account control. Vault's DR Replication is designed to replicate live data, which would propagate intentional or accidental data corruption or deletion. To protect against these possibilities, you should backup Vault's storage backend. This is supported through the Raft Snapshot feature, which can be automated for regular archival backups. A cold site or new infrastructure could be re-hydrated from a Raft snapshot.
There is no set limit on the number of clusters within a replication set. The largest deployments today are in the 30+ cluster range. A Performance Replica cluster can have a Disaster Recovery cluster associated with it and can also replicate to multiple Disaster Recovery clusters. While a Vault cluster can possess a replication role (or roles), there are no special considerations required in terms of infrastructure, and clusters can assume (or be promoted or demoted) to another role. Special circumstances related to mount filters and HSM usage may limit swapping of roles, but those are based on specific organization configurations.
»Considerations Related to HSM Integration
Using replication with Vault clusters integrated with Hardware Security Module (HSM) or cloud auto-unseal devices for automated unseal operations has some details that should be understood during the planning phase.
- If a performance primary cluster uses an HSM, all other clusters within that replication set should use an HSM as well.
- If a performance primary cluster does NOT use an HSM (uses Shamir secret sharing method), the clusters within that replication set can be mixed, such that some may use an HSM, others may use Shamir.
This behavior is by design. A downstream Shamir cluster presents a potential attack vector in the replication group since a threshold of key holders could recreate the master key; therefore, bypassing the upstream HSM key protection.
NOTE: As of Vault 1.3, the master key is encrypted with shared keys and stored on disk similar to how a master key is encrypted by the auto-unseal key and stored on disk. This provides consistent behavior whether you are using the Shamir's Secret Sharing algorithm or auto-unseal, and it allows all three scenarios above to be valid. However, if Vault is protecting data subject to governance and regulatory compliance requirements, it is recommended that you implement a downstream HSM for auto-unseal.
»Deployment System Requirements
The following table provides guidelines for server sizing. Of particular note is the strong recommendation to avoid non-fixed performance CPUs, or "Burstable CPU" in AWS terms, such as T-series instances. Additionally, non-burstable SSD for disks should be used.
»Sizing for Vault Servers
|Size||CPU||Memory||Disk||Typical Cloud Instance Types|
|Small||2 core||8-16 GB RAM||100 GB||AWS: m5.large, m5.xlarge|
|Azure: Standard_D2_v3, Standard_D4_v3|
|GCE: n2-standard-2, n2-standard-4|
|Large||4-8 core||32-64 GB RAM||200 GB||AWS: m5.2xlarge, m5.4xlarge|
|Azure: Standard_D8_v3, Standard_D16_v3|
|GCE: n2-standard-8, n2-standard-16|
The small size category would be appropriate for most initial production deployments, development or testing environments.
The large size is for production environments where there is a consistent high workload. That might be a large number of transactions, a large number of secrets, or a combination of the two.
In general, processing requirements will be dependent on encryption workload and messaging workload (operations per second, and types of operations). Memory requirements will be dependent on the total size of secrets/keys stored in memory and should be sized according to that data (as should the hard drive storage).
Vault itself has minimal storage requirements when not using integrated storage (raft). However, when using integrated storage the infrastructure should have a relatively high-performance hard disk subsystem, hence the non-burstable SSD requirement. If many secrets are being generated or rotated frequently, this information will need to flush to disk often and can impact performance if slower hard drives are used.
Furthermore, network throughput is a common consideration for Vault servers. As both systems are HTTPS API driven, all incoming requests, communications between Vault cluster members, communications with external systems (per auth or secret engine configuration, and some audit logging configurations) and responses consume network bandwidth. Replication of Vault datasets across network boundaries should be achieved through Performance or DR Replication.
Vault Production Hardening Recommendations provides guidance on best practices for a production hardened deployment of Vault.
»Load Balancing Using External Load Balancer
External load balancers are supported as an entry point to a Vault cluster. The
external loadbalancer should poll the
sys/health endpoint to
detect the active node and route traffic accordingly. The loadbalancer should be
configured to make an HTTP request for the following URL to each node in the
http://<Vault Node URL>:8200/v1/sys/health.
The active Vault node will respond with a 200 while the standby nodes will return a 4xx response.
»Client IP Address Handling
There are two supported methods for handling client IP addressing behind a proxy or load balancer; X-Forwarded-For Headers and PROXY v1. Both require a trusted load balancer and IP address to be listed as allowed addresses to adhere to security best practices.
»External Token Storage
The Tokenization transformation feature reached General Availability (GA) in Vault 1.7. This feature introduces additional architectural considerations explored below.
NOTE: Learn more about this feature in Tokenize Data with Transform Secrets Engine tutorial.
The Tokenization feature requires an external data store to facilitate the mapping of tokens to cryptographic values. Key considerations for achieving such an architecture:
- Vault depends on the availabilty of the external data store. Mirror your data stores the same as you do for Vault.
- Architect your external data store installation for high availability (HA) and configure replication where applicable.
- Vault depends on the data of the external data store. Backup this data in the same cadence as your Vault data.
Vault architecture documentation explains each Vault component
To integrate Vault with existing LDAP server, refer to LDAP Auth Method documentation
Refer to the AppRole Pull Authentication tutorial to programmatically generate a token for a machine or app
Integrated Storage reference materials