Beta / Release Candidate Features

beta

Preflight Checklist - Migrating to Integrated Storage

Who should read this guide?

You should read this guide if you are currently running a Vault environment backed by an external system such as HashiCorp Consul to persist the Vault's encrypted data, and considering to migrate to the Vault's integrated storage.

Topics covered

Understand the architectural differences

It is important to understand the differences between the Vault cluster with external storage backend and the cluster using the integrated storage.

Reference architecture with Consul

The recommended number of Vault instances is 3 in a cluster which connects to a Consul cluster which may have 5 or more nodes as shown in the diagram below. (Total of 8 virtual machines to host a Vault HA environment.)

Reference Diagram

The processing requirements depend on the encryption and messaging workloads. Memory requirements will dependant on the total size of secrets stored in the memory. The Vault server itself has minimal storage requirements but the Consul nodes should have a relatively high-performance hard disk system.

Reference architecture with integrated storage

The recommended number of Vault instances is 5 in a cluster. In a single HA cluster, all Vault nodes share the data while an active node holds the lock; therefore, only the active node has write access. To achieve n-2 redundancy, (meaning that the cluster can still function after losing 2 nodes), an ideal size for a Vault HA cluster is 5 nodes.

Reference Diagram Details

Because the data gets persisted on the same host, the Vault server should be hosted on a relatively high-performance hard disk system.

Consul vs. Integrated Storage

The integrated storage eliminates the need for external storage; therefore, Vault is the only software you need to stand up a cluster. This indicates that the host machine must have disk capacity in an amount equal or greater to that of the existing external storage backend.

System requirements comparison

The fundamental difference between Vault's integrated storage and Consul is that the integrated storage stores everything on disk while Consul KV stores everything in its memory which impacts the host's RAM.

Machine sizes for Vault - Consul as its storage backend

It is recommended to avoid hosting Consul on an instance with burstable CPU.

SizeCPUMemoryDiskTypical Cloud Instance Types
Small2 core4-8 GB RAM25 GBAWS: m5.large
Azure: Standard_D2_v3
GCE: n1-standard-2, n1-standard-4
Large4-8 core16-32 GB RAM50 GBAWS: m5.xlarge, m5.2xlarge
Azure: Standard_D4_v3, Standard_D8_v3
GCE: n1-standard-8, n1-standard-16

Machine sizes for Vault with integrated storage

Vault's integrated storage is disk-bound; therefore, the recommendation is to choose an instance type with a SSD that does not allow burstable IOPS.

SizeCPUMemoryDiskTypical Cloud Instance Types
Small2 core8-16 GB RAM100 GBAWS: m5.large, m5.xlarge
Azure: Standard_D2_v3, Standard_D4_v3
GCE: n2-standard-2, n2-standard-4
Large4-8 core32-64 GB RAM200 GBAWS: m5.2xlarge, m5.4xlarge
Azure: Standard_D8_v3, Standard_D16_v3
GCE: n2-standard-8, n2-standard-16

If many secrets are being generated or rotated frequently, this information will need to be flushed to the disk often. Therefore, the infrastructure should have a relatively high-performance hard disk system when using the integrated storage.

Performance considerations

Because Consul KV is memory-bound, it is necessary to take a snapshot frequently. However, Vault's integrated storage persists everything on the disk which eliminates the need for such frequent snapshot operations. Take snapshots to back up the data so that you can restore them in case of data loss. This reduces the performance cost introduced by the frequent snapshot operations.

Consul's autopilot feature allows for automatic, operator-friendly management of Consul servers including cleanup of dead servers, monitoring the state of the Consul Raft cluster, and stable server introduction. The autopilot feature is currently not available in Vault's integrated storage.

Inspect Vault data

To inspect the Vault data that are stored in Consul, use the consul kv command to retrieve them, or you can use the Consul UI.

When you are using the integrated storage, the data inspection can be harder. Currently, the vault operator raft command does not have a subcommand designed for data inspection. The Metric UI introduced in Vault 1.4 only shows the number of tokens, HTTP requests, and entities.

Summary

The table below highlights the differences between Consul and integrated storage.

ConsiderationConsul as storage backendVault integrated storage
System requirementMemory optimized machineStorage optimized High IOPS machine
Data snapshotFrequent snapshotsNormal data backup strategy
Snapshot automationSnapshot agent (Consul Enterprise only)No out-of-the-box automation tool
Data inspectionUse consul kv commandLimited API endpoints
AutopilotSupportedNot available

Self-check questions

  • Where is the product expertise?
    • Do you already have Consul expertise?
    • Are you concerned about lack of Consul knowledge?
  • Do you currently experience any technical issue with Consul?
  • What motivates the data migration from the current storage backend to the integrated storage?
    • Reduce the operational overhead?
    • Reduce the number of machines to run?
    • Reduce the cloud infrastructure cost?
  • Do you have a staging environment where you can run production loads and verify that everything works as you expect?
  • Have you thought through the storage backup process or workflow after migrating to the integrated storage?
  • Do you currently rely heavily on using Consul to inspect Vault data?

Next step

If you are ready to migrate the current storage backend to integrated storage, refer to the Storage Migration Guide - Consul to Integrated Storage.

To deploy a new cluster with integrated storage, refer to the Vault HA Cluster with Integrated Storage guide.