This document applies to Vault Enterprise versions 1.6 - 1.8.
Vault Enterprise provides features for replicating data between Vault clusters for performance, availability, and disaster recovery purposes. In this tutorial, you will architect your Vault clusters according to HashiCorp recommended patterns and practices for replicating data.
»Multiple region deployment
»Resilience against region failure reference diagram
In this scenario, the clusters are replicated to guard against a full region failure. There are three Performance Replica Vault clusters (clusters A, B, C) each with its own DR cluster (clusters D, E, F) in a different Region. Each cluster has its associated Consul cluster for storage backend.
This architecture allows for n-2 at the region level provided all secrets and secrets engines are replicated across all clusters. Failure of the full Region 1 would require the DR cluster F to be promoted. Once this was done the Vault solution would be fully functional with some loss of redundancy till the Region 1 was restored. Applications would not have to re-authenticate as the DR cluster for each failed cluster would contain all leases and tokens.
»Resilience against cluster failure reference diagram
This solution has full resilience at a cluster level, but does not guard against region failure as the DR clusters for the Performance replicas are in the same region. There are certain use-cases where this is the preferred method where data cannot be replicated to other regions due to governance restrictions such as GDPR. Also some infrastructure frameworks may not have the ability to route application traffic to different Regions.
In these architectures, the Vault cluster is illustrated as a single entity, and would be one of the single clusters detailed above based on your number of Availability Zones. Multiple Vault clusters acting as a single Vault solution and replicating between them is available in Vault Enterprise only. OSS Vault can be set up in multiple clusters, but they would each be individual Vault solutions and would not support replication between clusters. The Vault documentation provides more detailed information on the replication capabilities within Vault Enterprise.
Vault performance replication allows for secrets management across many sites. Static secrets, authentication methods, and authorization policies are replicated to be active and available in multiple locations; however, leases, tokens and dynamic secrets are not.
NOTE: Refer to the Vault Mount Filter tutorial about filtering out secret engines from being replicated across regions.
»Disaster recovery replication
Vault disaster recovery replication ensures that a standby Vault cluster is kept synchronized with an active Vault cluster. This mode of replication includes data such as ephemeral authentication tokens, time-based token information as well as token usage data. This provides for aggressive recovery point objective in environments where preventing loss of ephemeral operational data is of the utmost concern. In any enterprise solution, Disaster Recovery Replicas are considered essential.
NOTE: Refer to the Vault Disaster Recovery Setup tutorial for additional information.
»Corruption or sabotage disaster recovery
Another common scenario to protect against, more prevalent in cloud environments that provide very high levels of intrinsic resiliency, might be the purposeful or accidental corruption of data and configuration, and/or a loss of cloud account control. Vault's DR Replication is designed to replicate live data, which would propagate intentional or accidental data corruption or deletion. To protect against these possibilities, you should implement regular backups of Vault's storage backend.
»Backups with the Integrated Storage backend
You can take regular snapshots of Vault's Integrated Storage backend using the Raft Snapshot API. New infrastructure can be restored from a Raft snapshot. Refer to the Raft storage tutorial for step-by-step instructions.
»Backups with the Consul storage backend
You can use the Consul Snapshot command to take snapshots of a Consul storage backend. These snapshots can be used to restore Consul storage to a new datacenter.
There is no set limit on the number of clusters within a replication set. The largest deployments today are in the 30+ cluster range. A Performance Replica cluster can have a Disaster Recovery cluster associated with it and can also replicate to multiple Disaster Recovery clusters.
While a Vault cluster can possess a replication role (or roles), there are no special considerations required in terms of infrastructure, and clusters can assume (or be promoted or demoted) to another role. Special circumstances related to mount filters and Hardware Security Module (HSM) usage may limit swapping of roles, but those are based on specific organization configurations.
»Considerations related to HSM integration
Using replication with Vault clusters integrated with Hardware Security Module (HSM) or cloud auto-unseal devices for automated unseal operations has some details that should be understood during the planning phase.
- If a performance primary cluster uses an HSM, all other clusters within that replication set should use an HSM as well.
- If a performance primary cluster does NOT use an HSM (uses Shamir secret sharing method), the clusters within that replication set can be mixed, such that some may use an HSM, others may use Shamir.
This behavior is by design. A downstream Shamir cluster presents a potential attack vector in the replication group since a threshold of key holders could recreate the master key; therefore, bypassing the upstream HSM key protection.
NOTE: As of Vault 1.3, the master key is encrypted with shared keys and stored on disk similar to how a master key is encrypted by the auto-unseal key and stored on disk. This provides consistent behavior whether you are using the Shamir's Secret Sharing algorithm or auto-unseal, and it allows all three scenarios above to be valid. However, if Vault is protecting data subject to governance and regulatory compliance requirements, it is recommended that you implement a downstream HSM for auto-unseal.